Abstract
Given a large directed graph, rapidly answering reachability queries between source and target nodes is an important problem. Existing methods for reachability tradeoff indexing time and space versus query time performance. However, the biggest limitation of existing methods is that they do not scale to very large real-world graphs. We present a simple yet scalable reachability index, called GRAIL, that is based on the idea of randomized interval labeling and that can effectively handle very large graphs. Based on an extensive set of experiments, we show that while more sophisticated methods work better on small graphs, GRAIL is the only index that can scale to millions of nodes and edges. GRAIL has linear indexing time and space, and the query time ranges from constant time to being linear in the graph order and size. Our reference C++ implementations are open source and available for download at http://www.code.google.com/p/grail/.
Similar content being viewed by others
References
Agrawal R., Borgida A., Jagadish H.V.: Efficient management of transitive relationships in large data and knowledge bases. SIGMOD Rec. 18(2), 253–262 (1989)
Barabási A.L., Albert R.: Emergence of scaling in random networks. Science 286(5439), 509–512 (1999)
Bouros, P., Skiadopoulos, S., Dalamagas, T., Sacharidis, D., Sellis, T.: Evaluating reachability queries over path collections. In: SSDBM, p. 416 (2009)
Bramandia, R., Choi, B., Ng, W.K.: On incremental maintenance of 2-hop labeling of graphs. In: WWW (2008)
Chen, L., Gupta, A., Kurul, M.E.: Stack-based algorithms for pattern matching on dags. In: VLDB (2005)
Chen, Y.: General spanning trees and reachability query evaluation. In: Canadian Conference on Computer Science and Software Engineering, Montreal (2009)
Chen, Y., Chen, Y.: An efficient algorithm for answering graph reachability queries. In: ICDE (2008)
Cheng, J., Yu, J.X., Lin, X., Wang, H., Yu, P.S.: Fast computing reachability labelings for large graphs with high compression rate. In: EBDT (2008)
Cohen, E.: Estimating the size of the transitive closure in linear time. In: 35th Annual Symposium on Foundations of Computer Science, pp. 190–200 (1994)
Cohen E.: Size-estimation framework with applications to transitive closure and reachability. J. Comput. Syst. Sci. 55(3), 441–453 (1997)
Cohen E., Halperin E., Kaplan H., Zwick U.: Reachability and distance queries via 2-hop labels. SIAM J. Comput. 32(5), 1335–1355 (2003)
Cormen T.H., Leiserson C.E., Rivest R.L., Stein C.: Introduction to Algorithms. MIT Press, Cambridge (2001)
Demetrescu, C., Italiano, G.: Fully dynamic transitive closure: breaking through the O (n 2) barrier. In: FOCS (2000)
Demetrescu C., Italiano G.: Dynamic shortest paths and transitive closure: algorithmic techniques and data structures. J. Discret. Algorithms 4(3), 353–383 (2006)
Dietz, P.F.: Maintaining order in a linked list. In: STOC (1982)
Gene Ontology. http://www.geneontology.org/ (2010). Accessed 4 Dec 2010
Gene Ontology. Go Database Guide. http://www.geneontology.org/GO.database.shtml#schema_notes (2010). Accessed 4 Dec 2010
He, H., Wang, H., Yang, J., Yu, P.S.: Compact reachability labeling for graph-structured data. In: CIKM (2005)
Herman, I.L: W3c semantic web faq. http://www.w3.org/2001/sw/SW-FAQ# (2010). Accessed 4 Dec 2010
Jagadish H.V.: A compression technique to materialize transitive closure. ACM Trans. Database Syst. 15(4), 558–598 (1990)
Jin, R., Xiang, Y., Ruan, N., Fuhry, D.: 3-hop: a high-compression indexing scheme for reachability query. In: SIGMOD (2009)
Jin, R., Xiang, Y., Ruan, N., Wang, H.: Efficient answering reachability queries on very large directed graphs. In: SIGMOD (2008)
King V., Sagert G.: A fully dynamic algorithm for maintaining the transitive closure. J. Comput. Syst. Sci. 65(1), 150–167 (2002)
Krommidas I., Zaroliagis C.: An experimental study of algorithms for fully dynamic transitive closure. J. Exp. Algorithmics 12, 16 (2008)
Leskovec, J.: Snap Network Analysis Library. http://snap.stanford.edu/snap/index.html (2010). Accessed 4 Dec 2010
Roditty, L., Zwick, U.: A fully dynamic reachability algorithm for directed graphs with an almost linear update time. In: STOC (2004)
Schenkel, R., Theobald, A., Weikum, G.: HOPI: an efficient connection index for complex XML document collections. In: EBDT (2004)
Schenkel, R., Theobald, A., Weikum, G.: Efficient creation and incremental maintenance of the hopi index for complex xml document collections. In: ICDE (2005)
Steve Harris, G.: Sparql 1.1 Query Language. http://www.w3.org/TR/sparql11-query/#propertypaths (2010). Accessed 4 Dec 2010
Trissl, S., Leser, U.: Fast and practical indexing and querying of very large graphs. In: SIGMOD (2007)
UniProt. http://www.uniprot.org/ (2010). Accessed 4 Dec 2010
Wang, H., He, H., Yang, J., Yu, P., Yu, J.X.: Dual labeling: answering graph reachability queries in constant time. In: ICDE (2006)
Wu, Z., Eadon, G., Das, S., Chong, E.I., Kolovski, V., Annamalai, M., Srinivasan, J.: Implementing an inference engine for rdfs/owl constructs and user-defined rules in oracle. In: International Conference on Data Engineering, pp. 1239–1248 (2008)
Yildirim H., Chaoji V., Zaki M.J.: Grail: scalable reachability index for large graphs. PVLDB 3(1), 276–284 (2010)
Yu, J.X., Lin, X., Wang, H., Yu, P.S., Cheng, J.: Fast computation of reachability labeling for large graphs. In: EBDT (2006)
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was supported in part by NSF Grants EMT-0829835, and CNS-0103708, and NIH Grant 1R01EB0080161-01A1.
Rights and permissions
About this article
Cite this article
Yıldırım, H., Chaoji, V. & Zaki, M.J. GRAIL: a scalable index for reachability queries in very large graphs. The VLDB Journal 21, 509–534 (2012). https://doi.org/10.1007/s00778-011-0256-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-011-0256-4