Skip to main content

Advertisement

Log in

A survey of RDF stores & SPARQL engines for querying knowledge graphs

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

RDF has seen increased adoption in recent years, prompting the standardization of the SPARQL query language for RDF, and the development of local and distributed engines for processing SPARQL queries. This survey paper provides a comprehensive review of techniques and systems for querying RDF knowledge graphs. While other reviews on this topic tend to focus on the distributed setting, the main focus of the work is on providing a comprehensive survey of state-of-the-art storage, indexing and query processing techniques for efficiently evaluating SPARQL queries in a local setting (on one machine). To keep the survey self-contained, we also provide a short discussion on graph partitioning techniques used in the distributed setting. We conclude by discussing contemporary research challenges for further improving SPARQL query engines. An extended version also provides a survey of over one hundred SPARQL query engines and the techniques they use, along with twelve benchmarks and their features.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21

Similar content being viewed by others

Notes

  1. In this paper, we abbreviate the union of sets \(M_1 \cup \ldots \cup M_n\) with \(M_1 \ldots M_n\). Hence, \({\varvec{I}}{\varvec{B}}\L \) stands for \({\varvec{I}} \cup {\varvec{B}} \cup \L \).

  2. We use the blank prefix (e.g., :DB) as an arbitrary example. Other prefixes used can be retrieved at http://prefix.cc/.

  3. SPARQL uses the syntax !(\(p_1\)|\(\ldots \)|\(p_k\)|\(p_{k+1}\)|\(\ldots \)|\(p_n\)) which can be written as , where \(P = \{p_1,\ldots ,p_k\}\) and \(P' = \{p_{k+1},\ldots ,p_n \}\) [50, 74].

  4. The definition of MINUS is slightly different from anti-join in that mappings with no overlapping variables on the right are ignored.

  5. We relax the typical requirement for a set partition that \(G_i \cap G_j = \emptyset \) for all \(1 \le i < j \le n\) to allow for the possibility of replication or other forms of redundancy.

  6. Given a graph, deciding if there is a k-way partition with fewer than n edges between partitions is NP-complete.

References

  1. Abadi, D.J., Marcus, A., Madden, S., Hollenbach, K.: SW-Store: a vertically partitioned DBMS for Semantic Web data management. VLDB J. 18(2), 385–406 (2009)

    Article  Google Scholar 

  2. Abdelaziz, I., Harbi, R., Khayyat, Z., Kalnis, P.: A survey and experimental comparison of distributed SPARQL engines for very large RDF data. PVLDB 10(13), 2049–2060 (2017)

    Google Scholar 

  3. Abdelaziz, I., Harbi, R., Salihoglu, S., Kalnis, P.: Combining vertex-centric graph processing with SPARQL for large-scale RDF data analytics. IEEE TPDS 28(12), 3374–3388 (2017)

    Google Scholar 

  4. Abul-Basher, Z.: Multiple-query optimization of regular path queries. In: International Conference on Data Engineering (ICDE), pp. 1426–1430. IEEE (2017)

  5. Akhter, A., Ngonga, A.-C.N., Saleem, M.: An empirical evaluation of RDF graph partitioning techniques. In: European Knowledge Acquisition Workshop, pp. 3–18. Springer (2018)

  6. Alaoui, K.: A categorization of RDF triplestores. In: International Conference on Smart City Applications (SCA), pp. 1–7. ACM (2019)

  7. Ali, W., Saleem, M., Yao, B., Hogan, A., Ngomo, A.N.: A survey of RDF stores & SPARQL engines for querying knowledge graphs. CoRR arXiv:2102.13027 (2020)

  8. Angles, R., Arenas, M., Barceló, P., Hogan, A., Reutter, J.L., Vrgoc, D.: Foundations of modern query languages for graph databases. ACM CSUR 50(5), 68:1-68:40 (2017)

    Google Scholar 

  9. Arroyuelo, D., Hogan, A., Navarro, G., Reutter, J.L., Rojas-Ledesma, J., Soto, A.: Worst-case optimal graph joins in almost no space. In: SIGMOD International Conference on Management of Data, pp. 102–114. ACM (2021)

  10. Atre, M., Hendler, J.A.: BitMat: a main memory bit-matrix of RDF triples. In: Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS), p. 33 (2009)

  11. Atserias, A., Grohe, M., Marx, D.: Size bounds and query plans for relational joins. SIAM J. Comput. 42(4), 1737–1767 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  12. Baier, J., Daroch, D., Reutter, J.L., Vrgoc, D.: Evaluating navigational RDF queries over the Web. In: ACM Conference on Hypertext and Social Media (HT), pp. 165–174. ACM (2017)

  13. Banane, M.: RDFMongo: a MongoDB distributed and scalable RDF management system based on meta-model. Int. J. Adv. Trends Comput. Sci. Eng. 8, 734–741 (2019)

    Article  Google Scholar 

  14. Bartoň, S.: Designing indexing structure for discovering relationships in RDF graphs. In: International Workshop on DAtabases, TExts, Specifications and Objects (DATESO), pp. 7–17. CEUR (2004)

  15. Beeri, C., Ramakrishnan, R.: On the power of magic. J. Log. Program. 10(3 & 4), 255–299 (1991)

    Article  MathSciNet  MATH  Google Scholar 

  16. Binna, R., Gassler, W., Zangerle, E., Pacher, D., Specht, G.: SpiderStore: a native main memory approach for graph storage. In: Grundlagen von Datenbanken (GI-Workshop), pp. 91–96. CEUR (2011)

  17. Bizer, C., Meusel, R., Primpel,A.: Web Data Commons—Microdata, RDFa, JSON-LD, and Microformat Data Sets. http://webdatacommons.org/structureddata/ (2020)

  18. Bornea, M.A., Dolby, J., Kementsietsidis, A., Srinivas, K., Dantressangle, P., Udrea, O., Bhattacharjee, B.: Building an efficient RDF store over a relational database. In: International Conference on Management of Data (SIGMOD), pp. 121–132. ACM (2013)

  19. Brickley, D., Guha, R.: RDF schema 1.1. W3C recommendation. https://www.w3.org/TR/rdf-schema/ (2014)

  20. Brisaboa, N.R., Cerdeira-Pena, A., de Bernardo, G., Fariña, A.: Revisiting compact RDF stores based on k2-trees. In: Data Compression Conference (DCC), pp. 123–132. IEEE (2020)

  21. Brisaboa, N.R., Cerdeira-Pena, A., Fariña, A., Navarro, G.: A compact RDF store using suffix arrays. In: String Processing and Information Retrieval (SPIRE), pp. 103–115. Springer (2015)

  22. Bröcheler, M., Pugliese, A., Subrahmanian, V.S.: DOGMA: a disk-oriented graph matching algorithm for RDF databases. In: International Semantic Web Conference (ISWC), pp. 97–113. Springer (2009)

  23. Buil-Aranda, C., Hogan, A., Umbrich, J., Vandenbussche, P.-Y.: SPARQL web-querying infrastructure: ready for action? In: International Semantic Web Conference (ISWC), pp. 277–293. Springer (2013)

  24. Buluç, A., Meyerhenke, H., Safro, I., Sanders, P., Schulz, C.: Recent advances in graph partitioning. In: Algorithm Engineering, pp. 117–158. Springer (2016)

  25. Callé, C., Cure, O., Calvez, P.: Motivations for an analytical RDF database system. https://openreview.net/forum?id=M4H2AdgOhFX (2021)

  26. Cappellari, P., Virgilio, R.D., Roantree, M.: Path-oriented keyword search over graph-modeled Web data. World Wide Web Conference (WWW) 15(5–6), 631–661 (2012)

    Article  Google Scholar 

  27. Cebiric, S., Goasdoué, F., Kondylakis, H., Kotzinos, D., Manolescu, I., Troullinou, G., Zneika, M.: Summarizing semantic graphs: a survey. VLDBJ 28(3), 295–327 (2019)

    Article  Google Scholar 

  28. Chantrapornchai, C., Choksuchat, C.: TripleID-Q: RDF query processing framework using GPU. IEEE TPDS 29(9), 2121–2135 (2018)

    Google Scholar 

  29. Chawla, T., Singh, G., Pilli, E., Govil, M.: Storage, partitioning, indexing and retrieval in Big RDF frameworks: a survey. Comput. Sci. Rev. 38, 100309 (2020)

    Article  Google Scholar 

  30. Chen, Y., Özsu, M.T., Xiao, G., Tang, Z., Li, K.: GSmart: an efficient SPARQL query engine using sparse matrix algebra—full version (2021)

  31. Chong, E.I., Das, S., Eadon, G., Srinivasan, J.: An efficient SQL-based RDF querying scheme. In: International Conference on Very Large Databases (VLDB), pp. 1216–1227. VLDB End. (2005)

  32. Corby, O., Faron-Zucker, C., Gandon, F.: LDScript: a linked data script language. In: International Semantic Web Conference (ISWC), LNCS. vol. 10587, pp. 208–224. Springer (2017)

  33. Dey, S.C., Cuevas-Vicenttín, V., Köhler, S., Gribkoff, E., Wang, M., Ludäscher, B.: On implementing provenance-aware regular path queries with relational query engines. In: Joint 2013 EDBT/ICDT Conferences, pp 214–223. ACM (2013)

  34. Duerst, M., Suignard, M.: Internationalized resource identifiers (IRIs). RFC 3987

  35. Elzein, N.M., Majid, M.A., Hashem, I.A.T., Yaqoob, I., Alaba, F.A., Imran, M.: Managing big RDF data in clouds: challenges, opportunities, and solutions. Sustain. Cities Soc. 39, 375–386 (2018)

    Article  Google Scholar 

  36. Erling, O., Mikhailov, I.: Virtuoso: RDF Support in a Native RDBMS, pp. 501–519. Springer, Berlin (2010)

    Google Scholar 

  37. Faye, D. C., Curé, O., Blin, G.: A survey of RDF storage approaches. In: Revue Africaine de la Recherche en Informatique et Mathématiques Appliquées, p. 25 (2012)

  38. Fionda, V., Pirrò, G., Consens, M.P.: Querying knowledge graphs with extended property paths. Semant. Web 10(6), 1127–1168 (2019)

    Article  Google Scholar 

  39. Fletcher, G.H.L., Peters, J., Poulovassilis, A.: Efficient regular path query evaluation using path indexes. In: Extending Database Technology (EDBT), pp. 636–639. OpenProceedings.org (2016)

  40. Galárraga, L., Hose, K., Schenkel, R.: Partout: a distributed engine for efficient RDF processing. In: WWW Companion, pp. 267–268. ACM (2014)

  41. Galkin, M., Endris, K.M., Acosta, M., Collarana, D., Vidal, M., Auer, S.: SMJoin: a multi-way join operator for SPARQL queries. In: International Conference on Semantic Systems (SEMANTICS), pp. 104–111. ACM (2017)

  42. Groppe, S., Groppe, J., Linnemann, V.: Using an index of precomputed joins in order to speed up SPARQL processing. In: International conference on enterprise information systems (ICEIS), pp. 13–20 (2007)

  43. Gubichev, A., Bedathur, S.J., Seufert, S.: Sparqling kleene: fast property paths in RDF-3X. In: Workshop on graph data management experiences & systems (GRADES), p. 14. CWI/ACM (2013)

  44. Gubichev, A., Neumann, T.: Path query processing on very large RDF graphs. In: International workshop on the web and databases (WebDB) (2011)

  45. Gubichev, A., Neumann,T.: Exploiting the query structure for efficient join ordering in SPARQL queries. In: International conference on extending database technology (EDBT), pp. 439–450. OpenProceedings.org (2014)

  46. Haase, P., Broekstra, J., Eberhart, A., Volz, R.: A comparison of RDF query languages. In: International Semantic Web Conference (ISWC), pp. 502–517. Springer (2004)

  47. Hammoud, M., Rabbou, D.A., Nouri, R., Beheshti, S.M.R., Sakr, S.: DREAM: distributed RDF engine with adaptive query planner and minimal communication. PVLDB 8(6), 654–665 (2015)

    Google Scholar 

  48. Harbi, R., Abdelaziz, I., Kalnis, P., Mamoulis, N., Ebrahim, Y., Sahli, M.: Accelerating SPARQL queries by exploiting hash-based locality and adaptive partitioning. VLDBJ 25(3), 355–380 (2016)

    Article  Google Scholar 

  49. Harris, S., Lamb, N., Shadbolt, N.: 4store: The design and implementation of a clustered RDF store. In: International Workshop on Scalable Semantic Web Systems (SSWS), pp. 94–109 (2009)

  50. Harris, S., Seaborne, A., Prud’hommeaux, E.: SPARQL 1.1 query language. W3C recommendation. http://www.w3.org/TR/sparql11-query/ (2013)

  51. Harth, A., Decker, S.: Optimized index structures for querying RDF from the Web. In: Latin American Web Congress (LA-WEB), pp. 71–80. IEEE (2005)

  52. Harth, A., Umbrich, J., Hogan, A., Decker, S.: YARS2: a federated repository for querying graph structured data from the web. In: International Semantic Web Conference (ISWC), pp. 211–224. Springer (2007)

  53. Hitzler, P., Krötzsch, M., Parsia, B., Patel-Schneider, P.F., Rudolph, S.: OWL 2 Web ontology language primer. W3C Recommendation https://www.w3.org/TR/owl2-primer/ (2012)

  54. Hogan, A., Reutter, J. L., Soto, A.: In-database graph analytics with recursive SPARQL. In: International semantic web conference (ISWC), pp. 511–528. Springer (2020)

  55. Hogan, A., Riveros, C., Rojas, C., Soto, A.: A worst-case optimal join algorithm for SPARQL. In: International Semantic Web Conference (ISWC), pp. 258–275. Springer (2019)

  56. Hogenboom, A., Frasincar, F., Kaymak, U.: Ant colony optimization for RDF chain queries for decision support. Expert Syst. Appl. 40(5), 1555–1563 (2013)

    Article  Google Scholar 

  57. Hogenboom, A., Milea, V., Frasincar, F., Kaymak, U.: RCQ-GA: RDF chain query optimization using genetic algorithms. In: E-Commerce and Web Technologies (EC-Web), pp. 181–192. Springer (2009)

  58. Hose, K., Schenkel, R.: WARP: Workload-aware replication and partitioning for RDF. In: ICDE Workshops, pp. 1–6 (2013)

  59. Huang, J., Abadi, D.J., Ren, K.: Scalable SPARQL querying of large RDF graphs. PVLDB 4(11), 1123–1134 (2011)

    Google Scholar 

  60. Ingalalli, V., Ienco, D., Poncelet, P.: Chapter 5, Querying RDF Data: A Multigraph-based Approach, pp. 135–165. Wiley, New York (2018)

    Google Scholar 

  61. Ioannidis, Y.E., Wong, E.: Query optimization by simulated annealing. In: International Conference on Management of Data (SIGMOD), pp. 9–22. ACM (1987)

  62. Jachiet, L., Genevès, P., Gesbert, N., Layaïda, N.: On the optimization of recursive relational queries: application to graph queries. In: SIGMOD International Conference on Management of Data (SIGMOD), pp. 681–697. ACM (2020)

  63. Jamour, F.T., Abdelaziz, I., Chen, Y., Kalnis, P.: Matrix algebra framework for portable, scalable and efficient query engines for RDF Graphs. In: EuroSys Conference, pp. 27:1–27:15. ACM (2019)

  64. Janke,D., Staab,S.: Storing and querying semantic data in the Cloud. In: Reasoning Web Summer School, pp. 173–222. Springer (2018)

  65. Janke, D., Staab, S., Thimm, M.: Koral: A glass box profiling system for individual components of distributed RDF stores. In: Workshop on Benchmarking Linked Data (BLINK). CEUR (2017)

  66. Janke, D., Staab, S., Thimm, M.: On data placement strategies in distributed RDF stores. In: International Workshop on Semantic Big Data (SBD), pp. 1–6. ACM (2017)

  67. Kalayci, E.G., Kalayci, T.E., Birant, D.: An ant colony optimisation approach for optimising SPARQL queries by reordering triple patterns. Inf. Syst. 50, 51–68 (2015)

    Article  Google Scholar 

  68. Kalinsky, O., Mishali, O., Hogan, A., Etsion, Y., Kimelfeld, B.: Efficiently charting RDF. CoRR arXiv:1811.10955 (2018)

  69. Kaoudi, Z., Manolescu, I.: RDF in the clouds: a survey. VLDB J. 24(1), 67–91 (2015)

    Article  Google Scholar 

  70. Karvounarakis, G., Magkanaraki, A., Alexaki, S., Christophides, V., Plexousakis, D., Scholl, M., Tolle, K.: Querying the semantic web with RQL. Comput. Net. 42(5), 617–640 (2003)

    Article  MATH  Google Scholar 

  71. Karypis, G., Kumar, V.: A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput. 20(1), 359–392 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  72. Katib, A., Slavov, V., Rao, P.: RIQ: fast processing of SPARQL queries on RDF quadruples. J. Web Semant. 37, 90–111 (2016)

    Article  Google Scholar 

  73. Koschmieder, A., Leser, U.: Regular path queries on large graphs. In: International Conference on Scientific and Statistical Database Management (SSDBM), LNCS. vol. 7338, pp. 177–194. Springer (2012)

  74. Kostylev, E.V., Reutter, J.L., Romero, M., Vrgoc, D.: SPARQL with property paths. In: International Semantic Web Conference (ISWC), pp. 3–18. Springer (2015)

  75. Kyzirakos, K., Karpathiotakis, M., Koubarakis, M.: Strabon: a semantic geospatial DBMS. In: International Semantic Web Conference (ISWC), pp. 295–311. Springer (2012)

  76. Ladwig, G., Harth, A.: CumulusRDF: linked data management on nested key-value stores (2011)

  77. Lampo, T., Vidal, M., Danilow, J., Ruckhaus, E.: To cache or not to cache: the effects of warming cache in complex SPARQL queries. In: On the Move to Meaningful Internet Systems (OTM), pp. 716–733. Springer (2011)

  78. Le, W., Kementsietsidis, A., Duan, S., Li,F.: Scalable multi-query optimization for SPARQL. In: International Conference on Data Engineering (ICDE), ICDE ’12, USA. IEEE, pp. 666–677 (2012)

  79. Letelier, A., Pérez, J., Pichler, R., Skritek, S.: Static analysis and optimization of semantic web queries. ACM TODS 38(4), 25:1-25:45 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  80. Liu, B., Hu, B.: HPRD: a high performance RDF database. Int. J. Parallel Emerg. Distrib. Syst. 25(2), 123–133 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  81. Lorey, J., Naumann, F.: Caching and prefetching strategies for SPARQL queries. In: ESWC Satellite Events, pp. 46–65. Springer (2013)

  82. Luo, Y., Picalausa, F., Fletcher, G.H.L., Hidders, J., Vansummeren, S.: Storing and indexing massive RDF datasets. In: Semantic Search over the Web, pp. 31–60. Springer (2012)

  83. Lyu, X., Wang, X., Li, Y., Feng, Z., Wang, J.: GraSS: an efficient method for RDF subgraph matching. In: Web Information Systems Engineering Conference (WISE), pp. 108–122. Springer (2015)

  84. Ma, Z., Capretz, M.A., Yan, L.: Storing massive Resource Description Framework (RDF) data: a survey. Knowl. Eng. Rev. 31(4), 391–413 (2016)

    Article  Google Scholar 

  85. Madkour, A., Aly, A.M., Aref, W.G.: WORQ: workload-driven RDF query processing. In: International Semantic Web Conference (ISWC), pp. 583–599. Springer (2018)

  86. Maharjan, R., Lee, Y., Lee, S.: Exploiting path indexes to answer complex queries in ontology repository. In: International Conference on Computational Science and Its Applications (ICCSA), pp. 56–61. IEEE (2009)

  87. Malyshev, S., Krötzsch, M., González, L., Gonsior, J., Bielefeldt, A.: Getting the most out of Wikidata: semantic technology usage in Wikipedia’s knowledge graph. In: International Semantic Web Conference (ISWC), pp. 376–394. Springer (2018)

  88. Martens, W., Trautner, T.: Evaluation and enumeration problems for regular path queries. In: International Conference on Database Theory (ICDT), pp. 19:1–19:21 (2018)

  89. Martin, M., Unbehauen, J., Auer, S.: Improving the performance of semantic web applications with SPARQL query caching. In: Extended Semantic Web Conference (ESWC), pp. 304–318. Springer (2010)

  90. McGlothlin, J.P., Khan, L.R.: RDFJoin: a scalable data model for persistence and efficient querying of RDF datasets. Technical Report UTDCS-08-09, University of Texas at Dallas (2009)

  91. Meimaris, M., Papastefanatos, G.: Distance-based triple reordering for SPARQL query optimization. In: International Conference on Data Engineering (ICDE), pp. 1559–1562. IEEE Computer Society (2017)

  92. Meimaris, M., Papastefanatos, G., Mamoulis, N., Anagnostopoulos, I.: Extended characteristic sets: graph indexing for SPARQL query optimization. In: International Conference on Data Engineering (ICDE), pp. 497–508. IEEE (2017)

  93. Metzler, S., Miettinen, P.: On defining SPARQL with Boolean tensor algebra. CoRR arXiv:1503.00301 (2015)

  94. Minier, T., Skaf-Molli, H., Molli, P.: SaGe: Web preemption for public SPARQL query services. In: World Wide Web Conference (WWW), pp. 1268–1278. ACM (2019)

  95. Miura, K., Amagasa, T., Kitagawa, H.: Accelerating regular path queries using FPGA. In: International Workshop on Accelerating Analytics and Data Management Systems Using Modern Processor and Storage Architectures (ADMS@VLDB), pp. 47–54 (2019)

  96. Navarro, G., Reutter , J. L., Rojas-Ledesma, J.: Optimal joins using compact data structures. In: International Conference on Database Theory (ICDT), pp. 21:1–21:21. S. Dagstuhl (2020)

  97. Neumann, T., Moerkotte, G.: Characteristic sets: accurate cardinality estimation for RDF queries with multiple joins. In: International Conference on Data Engineering (ICDE), pp. 984–994. IEEE (2011)

  98. Neumann, T., Weikum, G.: Scalable join processing on very large RDF graphs. In: International Conference on Management of Data (SIGMOD), pp. 627–640. ACM (2009)

  99. Neumann, T., Weikum, G.: The RDF-3X engine for scalable management of RDF data. VLDBJ 19(1), 91–113 (2010)

    Article  Google Scholar 

  100. Ngo, H.Q., Porat, E., Ré, C., Rudra, A.: Worst-case optimal join algorithms. J. ACM 65(3), 161–1640 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  101. Nguyen, V., Kim, K.: Efficient regular path query evaluation by splitting with unit-subquery cost matrix. IEICE Trans. Inf. Syst. 100(10), 2648–2652 (2017)

    Article  MathSciNet  Google Scholar 

  102. Özsu, M.T.: A survey of RDF data management systems. Front. Comput. Sci. 10(3), 418–432 (2016)

    Article  Google Scholar 

  103. Pan, Z., Zhu, T., Liu, H., Ning, H.: A survey of RDF management technologies and benchmark datasets. J. Ambient Intell. Humaniz. Comput. 9(5), 1693–1704 (2018)

    Article  Google Scholar 

  104. Papadaki, M.-E., Spyratos, N., Tzitzikas, Y.: Towards interactive analytics over RDF graphs. Algorithms 14(2), 34 (2021)

    Article  MathSciNet  Google Scholar 

  105. Papailiou, N., Konstantinou, I., Tsoumakos, D, Karras, P., Koziris, N.: H2RDF+: high-performance distributed joins over large-scale RDF graphs. In: Big Data, pp. 255–263 (2013)

  106. Papailiou, N., Tsoumakos, D., Karras, P., Koziris, N.: Graph-aware, workload-adaptive SPARQL query caching. In: International Conference on Management of Data (SIGMOD), pp. 1777–1792. ACM (2015)

  107. Peng, P., Ge, Q., Zou, L., Özsu, M.T., Xu, Z., Zhao, D.: Optimizing multi-query evaluation in federated RDF systems. IEEE TKDE 33(4), 1692–1707 (2021)

    Google Scholar 

  108. Pérez, J., Arenas, M., Gutierrez, C.: Semantics and complexity of SPARQL. ACM TODS 34(3), 1–45 (2009)

    Article  Google Scholar 

  109. Pham, M., Boncz, P.A.: Exploiting emergent schemas to make RDF systems more efficient. In: International Semantic Web Conference (ISWC), pp. 463–479 (2016)

  110. Pibiri, G.E., Perego, R., Venturini, R.: Compressed indexes for fast search of semantic data. IEEE TKDE 33(9), 3187–3198 (2021)

    Google Scholar 

  111. Picalausa, F., Luo, Y., Fletcher, G.H.L., Hidders, J., Vansummeren, S.: A structural approach to indexing triples. In: Extended Semantic Web Conference (ESWC), pp. 406–421. Springer (2012)

  112. Purohit, S., Van, N., Chin, G.: Semantic property graph for scalable knowledge graph analytics. CoRR arXiv:2009.07410 (2020)

  113. Ravindra, P., Kim, H., Anyanwu, K.: An intermediate algebra for optimizing RDF graph pattern matching on MapReduce. In: Extended Semantic Web Conference (ESWC), pp. 46–61. Springer (2011)

  114. Reutter, J.L., Soto, A., Vrgoc, D.: Recursion in SPARQL. In: International Semantic Web Conference (ISWC), pp. 19–35. Springer (2015)

  115. Rohloff, K., Schantz, R.E.: High-performance, massively scalable distributed systems using the mapreduce software framework: the SHARD triple-store. In: Programming Support Innovations for Emerging Distributed Applications (PSI EtA). ACM (2010)

  116. Sakr, S., Al-Naymat, G.: Relational processing of RDF queries: a survey. SIGMOD Rec. 38(4), 23–28 (2010)

    Article  Google Scholar 

  117. Saleem, M., Ali, M.I., Hogan, A., Mehmood, Q., Ngomo, A.N.: LSQ: the linked SPARQL queries dataset. In: International Semantic Web Conference (ISWC), pp. 261–269. Springer (2015)

  118. Schätzle, A., Przyjaciel-Zablocki, M., Lausen, G.: PigSPARQL: mapping SPARQL to Pig Latin. In: International Workshop on Semantic Web Information Management (SWIM) (2011)

  119. Schätzle, A., Przyjaciel-Zablocki, M., Neu, A., Lausen, G.: Sempala: interactive SPARQL query processing on Hadoop. In: International Semantic Web Conference (ISWC), pp. 164–179 (2014)

  120. Schätzle, A., Przyjaciel-Zablocki, M., Skilevic, S., Lausen, G.: S2RDF: RDF querying with SPARQL on Spark. PVLDB 9(10), 804–815 (2016)

    Google Scholar 

  121. Schmidt, M., Meier, M., Lausen, G.: Foundations of SPARQL query optimization. In: International Conference on Database Theory (ICDT), pp. 4–33 (2010)

  122. Schreiber, G., Raimond, Y.: RDF 1.1 primer. W3C Working Group Note. http://www.w3.org/TR/rdf11-primer/ (2014)

  123. Selinger, P.G., Astrahan, M.M., Chamberlin, D.D., Lorie, R.A., Price, T.G.: Access path selection in a relational database management system. In: International Conference on Management of Data (SIGMOD), pp. 23–34. ACM (1979)

  124. Seufert, S., Anand, A., Bedathur, S.J., Weikum, G.: FERRARI: flexible and efficient reachability range assignment for graph indexing. In: International Conference on Data Engineering (ICDE), pp. 1009–1020. IEEE (2013)

  125. Shi, J., Yao, Y., Chen, R., Chen, H., Li, F.: Fast and concurrent RDF queries with RDMA-based distributed graph exploration. In: Conference on Operating Systems Design and Implementation (OSDI), pp. 317–332. USENIX (2016)

  126. Sintek, M., Kiesel, M.: RDFBroker: a signature-based high-performance RDF store. In: European Semantic Web Conference (ESWC), pp. 363–377. Springer (2006)

  127. Stocker, M., Seaborne, A., Bernstein, A., Kiefer, C., Reynolds, D.: SPARQL basic graph pattern optimization using selectivity estimation. In World Wide Web Conference (WWW), pp. 595–604. ACM (2008)

  128. Stuckenschmidt, H.: Similarity-based query caching. In: International Conference on Flexible Query Answering Systems (FQAS), pp. 295–306. Springer (2004)

  129. Stuckenschmidt, H., Vdovjak, R., Broekstra, J., Houben, G.: Towards distributed processing of RDF path queries. Int. J. Web Eng. Technol. 2(2/3), 207–230 (2005)

    Article  Google Scholar 

  130. Svoboda, M., Mlỳnková, I.: Linked data indexing methods: a survey. In: OTM Confederated International Conferences (OTM), pp. 474–483. Springer (2011)

  131. Thakkar, H., Angles, R., Rodriguez, M., Mallette, S., Lehmann, J.: Let’s build Bridges, not Walls: SPARQL Querying of TinkerPop Graph Databases with Sparql-Gremlin. In: International Conference on Semantic Computing (ICSC), pp. 408–415. IEEE (2020)

  132. Thompson, B.B., Personick, M., Cutcher, M.: The Bigdata® RDF Graph Database. In: Linked Data Management, pp. 193–237. CRC Press (2014)

  133. Tran, T., Ladwig, G., Rudolph, S.: Managing structured and semistructured RDF data using structure indexes. IEEE TKDE 25(9), 2076–2089 (2013)

    Google Scholar 

  134. Tsialiamanis, P., Sidirourgos, L., Fundulaki, I., Christophides, V., Boncz, P.A.: Heuristics-based query optimisation for SPARQL. In: International Conference on Extending Database Technology (EDBT), pp. 324–335. ACM (2012)

  135. Udrea, O., Pugliese, A., Subrahmanian, V.S.: GRIN: a graph based RDF index. In: Conference on Artificial Intelligence (AAAI), pp. 1465–1470. AAAI (2007)

  136. Veldhuizen, T.L.: Triejoin: a simple, worst-case optimal join algorithm. In: International Conference on Database Theory (ICDT), pp. 96–106. OpenProceedings.org (2014)

  137. Vidal, M., Ruckhaus, E., Lampo, T, Martínez, A., Sierra, J., Polleres, A.: Efficiently joining group patterns in SPARQL queries. In: Extended Semantic Web Conference (ESWC), pp. 228–242. Springer (2010)

  138. Vlachou, A., Doulkeridis, C., Glenis, A., Santipantakis, G.M., Vouros, G.A.: Efficient spatio-temporal RDF query processing in large dynamic knowledge bases. In: Symposium on Applied Computing (SAC), pp. 439–447. ACM (2019)

  139. Vrandecic, D., Krötzsch, M.: Wikidata: a free collaborative knowledgebase. CACM 57(10), 78–85 (2014)

    Article  Google Scholar 

  140. Wadhwa, S., Prasad, A., Ranu, S., Bagchi, A., Bedathur, S.: Efficiently answering regular simple path queries on large labeled networks. In: SIGMOD International Conference on Management of Data, pp. 1463–1480. ACM (2019)

  141. Wang, S., Lou, C., Chen, R., Chen, H.: Fast and concurrent RDF queries using RDMA-assisted GPU graph exploration. In: USENIX Conference on Usenix Annual Technical Conference, USENIX ATC ’18, USA, pp. 651–664. USENIX (2018)

  142. Weiss, C., Karras, P., Bernstein, A.: Hexastore: sextuple indexing for semantic web data management. PVLDB 1(1), 1008–1019 (2008)

    Google Scholar 

  143. Wilkinson, K., Sayers, C., Kuno, H., Reynolds, D.: Efficient RDF storage and retrieval in Jena2. In:International Conference on Semantic Web and Databases (SWDB), pp. 120–139. CEUR (2003)

  144. Williams, G.T., Weaver, J.: Enabling fine-grained HTTP caching of SPARQL query results. In: International Semantic Web Conference (ISWC), pp. 762–777. Springer (2011)

  145. Wood, D., Gearon, P., Adams, T.: Kowari: a platform for semantic web storage and analysis. In: XTech Conference, pp. 1–7 (2005)

  146. Wu, G., Li, J., Hu, J., Wang, K.: System pi: a native RDF repository based on the hypergraph representation for RDF data model. J. Comput. Sci. Technol. 24(4), 652–664 (2009)

    Article  Google Scholar 

  147. Wu, G., Yang, M.: Improving SPARQL query performance with algebraic expression tree based caching and entity caching. J. Zhejiang Univ. Sci. C 13(4), 281–294 (2012)

    Article  Google Scholar 

  148. Wylot, M., Hauswirth, M., Cudré-Mauroux, P., Sakr, S.: RDF data storage and query processing schemes: a survey. ACM CSUR 51(4), 84:1-84:36 (2018)

    Google Scholar 

  149. Yakovets, N., Godfrey, P., Gryz, J.: Evaluation of SPARQL property paths via recursive SQL. In: Alberto Mendelzon International Workshop on Foundations of Data Management (AMW). CEUR (2013)

  150. Yakovets, N., Godfrey, P., Gryz, J.: Query planning for evaluating SPARQL property paths. In: International Conference on Management of Data (SIGMOD), pp. 1875–1889. ACM (2016)

  151. Yasin, M.Q., Zhang, X., Haq, R., Feng, Z., Yitagesu, S.: A comprehensive study for essentiality of graph based distributed SPARQL query processing. In: International Conference on Database Systems for Advanced Applications (DASFAA), pp. 156–170. Springer (2018)

  152. Zambom Santana, L.H., dos Santos Mello, R.: An analysis of mapping strategies for storing RDF data into NoSQL databases. In: Symposium on Applied Computing (SAC), pp. 386–392. ACM (2020)

  153. Zeng, K., Yang, J., Wang, H., Shao, B., Wang, Z.: A distributed graph engine for web scale RDF data. In: PVLDB, pp. 265–276 (2013)

  154. Zervakis, L., Setty, V., Tryfonopoulos, C., Hose, K.: Efficient continuous multi-query processing over graph streams. In: International Conference on Extending Database Technology (EDBT), pp. 13–24. OpenProceedings.org (2020)

  155. Zhang, W.E. , Sheng, Q.Z., Taylor, K., Qin, Y.: Identifying and caching hot triples for efficient RDF query processing. In: Database Systems for Advanced Applications (DASFAA), pp. 259–274. Springer (2015)

  156. Zhang, X., Chen, L., Tong, Y., Wang, M.: EAGRE: towards scalable I/O efficient SPARQL query evaluation on the cloud. In: International Conference on Data Engineering (ICDE), pp. 565–576 (2013)

  157. Zou, L., Mo, J., Chen, L., Tamer Özsu, M., Zhao, D.: gStore: answering SPARQL queries via subgraph matching. PVLDB 4(8), 482–493 (2011)

    Google Scholar 

Download references

Acknowledgements

This work was partially funded by a grant from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie Grant Agreement No. 860801. Hogan was supported by Fondecyt Grant No. 1181896 and ANID—Millennium Science Initiative Program—Code ICN17_002. Bin Yao was supported by the NSFC (61922054, 61872235, 61832017, 61729202, 61832013), the National Key Research and Development Program of China (2020YFB1710202, 2018YFC1504504), the Science and Technology Commission of Shanghai Municipality (STCSM) AI under Project 19511120300. This work was also supported by the German Federal Ministry of Education and Research (BMBF) within the EuroStars project E1114681 3DFed under the Grant No. 01QE2114 and project KnowGraphs (No. 860801).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bin Yao.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ali, W., Saleem, M., Yao, B. et al. A survey of RDF stores & SPARQL engines for querying knowledge graphs. The VLDB Journal 31, 1–26 (2022). https://doi.org/10.1007/s00778-021-00711-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-021-00711-3

Keywords

Navigation