Encyclopedia of Big Data Technologies

Living Edition
| Editors: Sherif Sakr, Albert Zomaya

Indexing for Graph Query Evaluation

  • George FletcherEmail author
  • Martin Theobald
Living reference work entry
DOI: https://doi.org/10.1007/978-3-319-63962-8_212-1

Definitions

Given a graph, an index is a data structure supporting a map from a collection of keys to a collection of elements in the graph. For example, we may have an index on node labels, which, given a node label as search key, facilitates accelerated access to all nodes of the graph having the given label. The evaluation of queries on graph databases is often facilitated by index data structures. An index can be the primary representation of the graph or can be a secondary access path to elements of the graph.

Overview

In this article, we give a succinct overview of the main approaches to indexing graphs for efficient graph query evaluation. We focus our discussion on exact query processing and do not consider lossy graph representations. Rather than aiming for an exhaustive survey, we illustrate each approach with select exemplars which highlight the main ideas of the approach.

Key Research Findings

There is a wealth of work on graph indexing, which can be organized along three...

This is a preview of subscription content, log in to check access.

References

  1. Abadi DJ, Marcus A, Madden SR, Hollenbach K (2009) SW-Store: a vertically partitioned DBMS for semantic web data management. VLDB J 18(2):385–406CrossRefGoogle Scholar
  2. Atre M, Chaoji V, Zaki MJ, Hendler JA (2010) Matrix “Bit” loaded: a scalable lightweight join query processor for RDF data. In: WWW. ACM, pp 41–50Google Scholar
  3. Ching A, Edunov S, Kabiljo M, Logothetis D, Muthukrishnan S (2015) One trillion edges: graph processing at facebook-scale. PVLDB 8(12):1804–1815CrossRefGoogle Scholar
  4. Erling O, Mikhailov I (2009) Virtuoso: RDF support in a native RDBMS. In: SWIM, pp 501–519Google Scholar
  5. Fan W, Li J, Wang X, Wu Y (2012a) Query preserving graph compression. In: Proceedings of the ACM SIGMOD international conference on management of data, SIGMOD 2012, Scottsdale, 20–24 May 2012, pp 157–168Google Scholar
  6. Fan W, Wang X, Wu Y (2012b) Performance guarantees for distributed reachability queries. PVLDB 5(11):1304–1315CrossRefGoogle Scholar
  7. Fletcher G, Van Gucht D, Wu Y, Gyssens M, Brenes S, Paredaens J (2009) A methodology for coupling fragments of XPath with structural indexes for XML documents. Inf Syst 34(7):657–670CrossRefGoogle Scholar
  8. Fletcher G, Gyssens M, Leinders D, Van den Bussche J, Van Gucht D, Vansummeren S (2015) Similarity and bisimilarity notions appropriate for characterizing indistinguishability in fragments of the calculus of relations. J Logic Comput 25(3):549–580MathSciNetCrossRefGoogle Scholar
  9. Fletcher G, Gyssens M, Paredaens J, Van Gucht D, Wu Y (2016) Structural characterizations of the navigational expressiveness of relation algebras on a tree. J Comput Syst Sci 82(2):229–259MathSciNetCrossRefGoogle Scholar
  10. Message Passing Interface Forum (2015) MPI: a message passing interface standard, version 3.1. https://www.mpi-forum.org/docs/mpi-3.1/mpi31-report.pdf
  11. Gao S, Anyanwu K (2013) PrefixSolve: efficiently solving multi-source multi-destination path queries on RDF graphs by sharing suffix computations. In: WWW, pp 423–434Google Scholar
  12. Gubichev A, Bedathur SJ, Seufert S (2013) Sparqling Kleene: fast property paths in RDF-3X. In: GRADES, pp 14:1–14:7Google Scholar
  13. Gurajada S, Theobald M (2016) Distributed set reachability. In: SIGMOD, pp 1247–1261Google Scholar
  14. Gurajada S, Seufert S, Miliaraki I, Theobald M (2014) TriAD: a distributed shared-nothing RDF engine based on asynchronous message passing. In: SIGMOD, pp 289–300Google Scholar
  15. Harbi R, Abdelaziz I, Kalnis P, Mamoulis N (2015) Evaluating sparql queries on massive RDF datasets. Proc VLDB Endow 8(12):1848–1851. https://doi.org/10.14778/2824032.2824083CrossRefGoogle Scholar
  16. Harris S, Gibbins N (2003) 3store: efficient bulk RDF storage. In: PSSS, pp 1–15Google Scholar
  17. Harris S, Lamb N, Shadbolt N (2009) 4store: the design and implementation of a clustered RDF store. In: SSWSGoogle Scholar
  18. Hellings J, Fletcher G, Haverkort H (2012) Efficient external-memory bisimulation on DAGs. In: Proceedings of the ACM SIGMOD international conference on management of data, SIGMOD 2012, Scottsdale, 20–24 May 2012, pp 553–564Google Scholar
  19. Huang J, Abadi DJ, Ren K (2011) Scalable SPARQL querying of large RDF graphs. PVLDB 4(11): 1123–1134Google Scholar
  20. Lin J, Dyer C (2010) Data-intensive text processing with MapReduce. Synthesis lectures on human language technologies. Morgan & Claypool Publishers, San RafaelCrossRefGoogle Scholar
  21. Luo Y, Picalausa F, Fletcher GHL, Hidders J, Vansummeren S (2012) Storing and indexing massive RDF datasets. In: Virgilio RD, Guerra F, Velegrakis Y (eds) Semantic search over the web. Springer, Berlin/Heidelberg, pp 31–60. https://doi.org/10.1007/978-3-642-25008-8_2CrossRefGoogle Scholar
  22. Luo Y, de Lange Y, Fletcher G, De Bra P, Hidders J, Wu Y (2013a) Bisimulation reduction of big graphs on MapReduce. In: Big data – proceedings of 29th British National conference on databases, BNCOD 2013, Oxford, 8–10 July 2013, pp 189–203CrossRefGoogle Scholar
  23. Luo Y, Fletcher G, Hidders J, Wu Y, De Bra P (2013b) External memory k-bisimulation reduction of big graphs. In: 22nd ACM international conference on information and knowledge management, CIKM’13, San Francisco, 27 Oct–1 Nov 2013, pp 919–928Google Scholar
  24. Maccioni A, Abadi DJ (2016) Scalable pattern matching over compressed graphs via dedensification. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, San Francisco, 13–17 Aug 2016, pp 1755–1764Google Scholar
  25. Malewicz G, Austern MH, Bik AJC, Dehnert JC, Horn I, Leiser N, Czajkowski G (2010) Pregel: a system for large-scale graph processing. In: SIGMOD, pp 135–146Google Scholar
  26. Maneth S, Peternek F (2016) Compressing graphs by grammars. In: 32nd IEEE international conference on data engineering, ICDE 2016, Helsinki, 16–20 May 2016, pp 109–120Google Scholar
  27. Milo T, Suciu D (1999) Index structures for path expressions. In: Database theory – ICDT’99, proceedings of 7th international conference, Jerusalem, 10–12 Jan 1999, pp 277–295Google Scholar
  28. Neumann T, Weikum G (2010a) The RDF-3X engine for scalable management of RDF data. VLDB J 19(1): 91–113CrossRefGoogle Scholar
  29. Neumann T, Weikum G (2010b) x-RDF-3X: fast querying, high update rates, and consistency for RDF databases. PVLDB 3:256–263CrossRefGoogle Scholar
  30. Peng P, Zou L, Özsu MT, Chen L, Zhao D (2016) Processing sparql queries over distributed RDF graphs. VLDB J 25(2):243–268. https://doi.org/10.1007/s00778-015-0415-0CrossRefGoogle Scholar
  31. Picalausa F, Luo Y, Fletcher G, Hidders J, Vansummeren S (2012) A structural approach to indexing triples. In: The semantic web: research and applications – proceedings of 9th extended semantic web conference, ESWC 2012, Heraklion, 27–31 May 2012, pp 406–421Google Scholar
  32. Picalausa F, Fletcher G, Hidders J, Vansummeren S (2014) Principles of guarded structural indexing. In: Proceedings of 17th international conference on database theory (ICDT), Athens, 24–28 Mar 2014, pp 245–256Google Scholar
  33. Potter A, Motik B, Nenov Y, Horrocks I (2016) Distributed RDF query answering with dynamic data exchange. In: ISWC, pp 480–497. https://doi.org/10.1007/978-3-319-46523-4_29Google Scholar
  34. Przyjaciel-Zablocki M, Schätzle A, Hornung T, Lausen G (2011) RDFPath: path query processing on large RDF graphs with MapReduce. In: ESWC, pp 50–64Google Scholar
  35. RDF (2014) Resource description framework. http://www.w3.org/RDF/
  36. Rohloff K, Schantz RE (2011) Clause-iteration with MapReduce to scalably query datagraphs in the SHARD graph-store. In: DIDC, pp 35–44Google Scholar
  37. Sarwat M, Elnikety S, He Y, Mokbel MF (2013) Horton+: a distributed system for processing declarative reachability queries over partitioned graphs. PVLDB 6(14):1918–1929CrossRefGoogle Scholar
  38. Seufert S, Anand A, Bedathur SJ, Weikum G (2013) FERRARI: flexible and efficient reachability range assignment for graph indexing. In: ICDE, pp 1009–1020Google Scholar
  39. Sidirourgos L, Goncalves R, Kersten M, Nes N, Manegold S (2008) Column-store support for RDF data management: not all swans are white. PVLDB 1(2):1553–1563CrossRefGoogle Scholar
  40. SPA (2013) SPARQL 1.1 overview. https://www.w3.org/TR/sparql11-overview/
  41. Su J, Zhu Q, Wei H, Yu JX (2017) Reachability querying: can it be even faster? IEEE Trans Knowl Data Eng 29(3):683–697CrossRefGoogle Scholar
  42. Then M, Kaufmann M, Chirigati F, Hoang-Vu T, Pham K, Kemper A, Neumann T, Vo HT (2014) The more the merrier: efficient multi-source graph traversal. PVLDB 8(4):449–460CrossRefGoogle Scholar
  43. Tian Y, Balmin A, Corsten SA, Tatikonda S, McPherson J (2013) From “think like a vertex” to “think like a graph”. PVLDB 7(3):193–204. http://www.vldb.org/pvldb/vol7/p193-tian.pdfCrossRefGoogle Scholar
  44. Tsialiamanis P, Sidirourgos L, Fundulaki I, Christophides V, Boncz PA (2012) Heuristics-based query optimisation for SPARQL. In: EDBT, pp 324–335Google Scholar
  45. Webber J (2012) A programmatic introduction to Neo4j. In: SPLASH, pp 217–218Google Scholar
  46. Weiss C, Karras P, Bernstein A (2008) Hexastore: sextuple indexing for semantic web data management. PVLDB 1(1):1008–1019CrossRefGoogle Scholar
  47. Xin RS, Gonzalez JE, Franklin MJ, Stoica I (2013) GraphX: a resilient distributed graph system on Spark. In: GRADESGoogle Scholar
  48. Yuan P, Liu P, Wu B, Jin H, Zhang W, Liu L (2013) TripleBit: a fast and compact system for large scale RDF data. PVLDB 6(7):517–528CrossRefGoogle Scholar
  49. Zaharia M, Chowdhury NMM, Franklin M, Shenker S, Stoica I (2010) Spark: cluster computing with working sets. Technical report UCB/EECS-2010-53, EECS Department, UC Berkeley, http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-53.html
  50. Zeng K, Yang J, Wang H, Shao B, Wang Z (2013) A distributed graph engine for web scale RDF data. PVLDB 6(4):265–276CrossRefGoogle Scholar
  51. Zhang X, Chen L, Tong Y, Wang M (2013) EAGRE: towards scalable I/O efficient SPARQL query evaluation on the cloud. In: ICDE, pp 565–576Google Scholar
  52. Zou L, Mo J, Chen L, Özsu MT, Zhao D (2011) gStore: answering SPARQL queries via subgraph matching. PVLDB 4(8):482–493Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Technische Universiteit EindhovenEindhovenNetherlands
  2. 2.Université du LuxembourgLuxembourgLuxembourg

Section editors and affiliations

  • Hannes Voigt
    • 1
  • George Fletcher
    • 2
  1. 1.Dresden Database Systems GroupTechnische Universität DresdenDresdenGermany
  2. 2.Department of Mathematics and Computer ScienceEindhoven University of Technology