Skip to main content
Log in

A survey of RDF data management systems

  • Review Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

RDF is increasingly being used to encode data for the semantic web and data exchange. There have been a large number of works that address RDF data management following different approaches. In this paper we provide an overview of these works. This review considers centralized solutions (what are referred to as warehousing approaches), distributed solutions, and the techniques that have been developed for querying linked data. In each category, further classifications are provided that would assist readers in understanding the identifying characteristics of different approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Suchanek F M, Kasneci G, Weikum G. Yago: a core of semanticknowledge. In: Proceedings of the 16th ACM International Conference on World Wide Web. 2007, 697–706

    Chapter  Google Scholar 

  2. Bizer C, Lehmann J, Kobilarov G, Auer S, Becker C, Cyganiak R, Hellmann S. DBpedia — a crystallization point for the web of data. J. Web Semantics: Science, Services and Agents on the World Wide Web, 2009, 7(3): 154–165

    Article  Google Scholar 

  3. Schmachtenberg M, Bizer C, Paulheim H. Adoption of best data practices in different topical domains. In: Proceedings of the 13th International Semantic Web Conference. 2014, 245–260

    Google Scholar 

  4. Zhang Y, Duc P M, Corcho O, Calbimonte J P. SRBench: A streamingRDF/ SPARQL benchmark. In: Proceedings of the 11th International. Semantic Web Conference. 2012, 641–657

    Google Scholar 

  5. Zaveri A, Rula A, Maurino A, Pietrobon R, Lehmann J, Auer S. Qualityassessment for linked data: a survey. Semantic Web, 2015, 7(1): 63–93

    Article  Google Scholar 

  6. Tang N. Big RDF data cleaning. In: proceedings of the 31st IEEE International Conference onData Engineering Workshops. 2015, 77–79

    Google Scholar 

  7. Klyne G, Carroll J J, McBride B. RDF 1.1 concepts and abstract syntax. W3C Recommendation, 2014

    Google Scholar 

  8. Harris S, Seaborne A, Prud’hommeaux E. SPARQL 1.1 query language. W3C Recommendation, 2013

    Google Scholar 

  9. Zou L, Özsu M T, Chen L, Shen X, Huang R, Zhao D. gStore: agraphbased SPARQL query engine. The VLDB journal, 2014, 23(4): 565–590

    Article  Google Scholar 

  10. Hartig O, Özsu MT. Reachable subwebs for traversal-based query execution. In: Proceedings of the 23rd International Conference on World Wide Web. 2014, 541–546

    Chapter  Google Scholar 

  11. Hartig O. SPARQL for a web of linked data: semantics and computability. In: Proceedings of the 9th Extended Semantic Web Conference. 2012, 8–23

    Google Scholar 

  12. W3C. SPARQL query language for RDF — formal definitions. Accessible at https://www.w3.org/2001/sw/DataAccess/rq23/sparqldefns. html. 2006

    Google Scholar 

  13. Wilkinson K. Jena property table implementation. Technical Report HPL-2006-140. 2006

    Google Scholar 

  14. Angles R, Gutierrez C. Theexpressive power of SPARQL. In: Proceedings of the 7th International Semantic Web Conference. 2008, 114–129

    Google Scholar 

  15. Sequeda J F, Arenas M, Miranker D P. OBDA: query rewriting or materialization? in practice, both! In: Proceedings of the 13th International Semantic Web Conference. 2014, 535–551

    Google Scholar 

  16. Broekstra J, Kampman A, Van Harmelen F. Sesame: a generic architecture for storing and querying RDF and RDF schema. In: Proceedings of the 1st International Semantic Web Conference. 2002, 54–68

    Google Scholar 

  17. Chong E, Das S, Eadon G, Srinivasan J. An efficient SQL-based RDF querying scheme. In: Proceedings of the 31st International Conference on Very Large Data Bases. 2005, 1216–1227

    Google Scholar 

  18. Weiss C, Karras P, Bernstein A. Hexastore: sextuple indexing for semantic web data management. Proceedings of the VLDB Endowment, 2008, 1(1): 1008–1019

    Article  Google Scholar 

  19. Neumann T, Weikum G. RDF-3X: a RISC-style engine for RDF. Proceedings of the VLDB Endowment, 2008, 1(1): 647–659

    Article  Google Scholar 

  20. Neumann T, Weikum G. The RDF-3X engine for scalable management of RDF data. The VLDB Journal, 2009, 19(1): 91–113

    Article  Google Scholar 

  21. Bornea M A, Dolby J, Kementsietsidis A, Srinivas K, Dantressangle P, Udrea O, Bhattacharjee B. Building an efficient RDF store over a relational database. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2013, 121–132

    Google Scholar 

  22. Abadi D J, Marcus A, Madden S R, Hollenbach K. Scalable semantic web data management using vertical partitioning. In: Proceedings of the 33rd International Conference on Very Large Data Bases. 2007, 411–422

    Google Scholar 

  23. Abadi D J, Marcus A, Madden S, Hollenbach K. SW-Store: a vertically partitioned DBMS for semantic web data management. The VLDB Journal, 2009, 18(2): 385–406

    Article  Google Scholar 

  24. Sidirourgos L, Goncalves R, Kersten M, Nes N, Manegold S. Columnstore support for RDF data management: not all swans are white. Proceedings of the VLDB Endowment, 2008, 1(2): 1553–1563

    Article  Google Scholar 

  25. Bönström V, Hinze A, Schweppe H. Storing RDF as a graph. In: Proceedings of the1st Latin American Web Congress. 2003, 27–36

    Google Scholar 

  26. Zou L, Mo J, Chen L, Özsu M T, Zhao D. gStore: answering SPARQL queries via subgraph matching. Proceedings of theVLDB Endowment, 2011, 4(8): 482–493

    Article  Google Scholar 

  27. Aluç G. Workload matters: arobust approach to physical RDF database design. Dissertation for the Doctoral Degree. Waterloo: University of Waterloo, 2015

    Google Scholar 

  28. Peng P, Zou L, Özsu M T, Chen L, Zhao D. Processing SPARQL queries over distributed RDF graphs. The VLDB Journal, 2016, 25(2): 243–268

    Article  Google Scholar 

  29. Khadilkar V, Kantarcioglu M, Thuraisingham B M, Castagna P. Jena-HBase: a distributed, scalable and efficient RDF triple store. In: Proceedings of the 11th International Semantic Web Conference Posters & Demonstrations Track. 2012, 85–88

    Google Scholar 

  30. Rohlo_ K, Schantz R E. High-performance, massively scalable distributed systems using the mapreduce software framework: the SHARD triple-store. In: Proceedings of ACM International Workshop on Programming Support Innovations for Emerging Distributed Applications. 2010

    Google Scholar 

  31. Husain M F, McGlothlin J, Masud M M, Khan L R, Thuraisingham B. Heuristics-based query processing for large RDF graphs using cloud computing. IEEE Transactions on Knowledge and Data Engineering, 2011, 23(9): 1312–1327

    Article  Google Scholar 

  32. Zhang X, Chen L, Wang M. Towards efficient join processing overlarge RDF graph using mapreduce. In: Proceedings of the 24th International Conference on Scientific and Statistical Database Management. 2012, 250–259

    Google Scholar 

  33. Zhang X, Chen L, Tong Y, Wang M. EAGRE: towards scalable I/Oefficient SPARQL query evaluation on the cloud. In: Proceedings of the 29th International Conference on Data Engineering. 2013, 565–576

    Google Scholar 

  34. Zeng K, Yang J, Wang H, Shao B, Wang Z. A distributed graph engine for web scale RDF data. Proceedings of the VLDB Endowment, 2013, 6(4): 265–276

    Article  Google Scholar 

  35. Papailiou N, Konstantinou I, Tsoumakos D, Koziris N. H2RDF: adaptive query processing on RDF data in the cloud. In: Proceedings of the 21st ACM International Conference Companion on World Wide Web. 2012, 397–400

    Google Scholar 

  36. Papailiou N, Tsoumakos D, Konstantinou I, Karras P, Koziris N. H2RDF+: an efficient data management system for big RDF graphs. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2014, 909–912

    Google Scholar 

  37. Kaoudi Z, Manolescu I. RDF in the clouds: a survey. The VLDB Journal, 2015, 24: 67–91

    Article  Google Scholar 

  38. Li F, Ooi B C, Özsu M T, Wu S. Distributed data management using MapReduce. ACM Computing Surveys (CSUR), 2014, 46(3)

    Google Scholar 

  39. Karypis G, Kumar V. Analysis of multilevel graph partitioning. In: Proceedings of the ACM/IEEE Conference on Supercomputing. 1995

    Google Scholar 

  40. Shao B, Wang H, Li Y. Trinity: a distributed graph engine on a memory cloud. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2013, 505–516

    Google Scholar 

  41. Huang J, Abadi D J, Ren K. Scalable SPARQL querying of large RDF graphs. Proceedings of the VLDB Endowment, 2011, 4(11): 1123–1134

    Google Scholar 

  42. Hose K, Schenkel R. WARP: workload-aware replication and partitioning for RDF. In: Proceedings of the 29th IEEE International Conference on Data Engineering Workshops. 2013, 1–6

    Google Scholar 

  43. Galarraga L, Hose K, Schenkel R. Partout: a distributed engine for efficient RDF processing. In: Proceedings of the Companion Publication of the 23rd International Conference on World Wide Web. 2014, 267–268

    Chapter  Google Scholar 

  44. Lee K, Liu L. Scaling queries over big RDF graphs with semantic hash partitioning. Proceedings of the VLDB Endowment, 2013, 6(14): 1894–1905

    Article  Google Scholar 

  45. Gurajada S, Seufert S, Miliaraki I, Theobald M. TriAD: a distributed shared-nothing RDF engine based on asynchronous message passing. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2014, 289–300

    Google Scholar 

  46. Quilitz B. Querying distributed RDF data sources with SPARQL. In: Proceedings of the 5th European Semantic Web Conference. 2008, 524–538

    Google Scholar 

  47. Harth A, Hose K, Karnstedt M, Polleres A, Sattler K, Umbrich J. Data summaries for on-demand queries over linked data. In: Proceedings of the 19th ACM International Conference on World Wide Web. 2010, 411–420

    Google Scholar 

  48. Görlitz O, Staab S. SPLENDID: SPARQL endpoint federation exploiting VOID descriptions. In: Proceedings of ISWC Workshop on Consuming Linked Data. 2011

    Google Scholar 

  49. Saleem M, Ngomo A N. HiBISCuS: Hypergraph-based source selection for SPARQL endpoint federation. In: Proceedings of the 11th Extended Semantic Web Conference. 2014, 176–191

    Google Scholar 

  50. Saleem M, Padmanabhuni S S, Ngomo A N, Iqbal A, Almeida J S, Decker S, Deus H F. TopFed: TCGA tailored federated query processing and linking to LOD. Biomedical Semantics, 2014, 5: 47

    Article  Google Scholar 

  51. Schwarte A, Haase P, Hose K, Schenkel R, Schmidt M. FedX: optimization techniques for federated query processing on linked data. In: Proceedings of the 10th International SemanticWeb Conference. 2011, 601–616

    Google Scholar 

  52. Astrahan M M, Blasgen M W, Chamberlin D D, Eswaran K P, Gray J N, Griffiths P P, King W F, Lorie R A, McJones P R, Mehl J W, Putzolu G R, Traiger I L, Wade B W, Watson V. System R: relational approach to database management. ACM Transactions on Database Systems (TODS), 1976, 1(2): 97–137

    Article  Google Scholar 

  53. Hartig O. An overview on execution strategies for linked data queries. Datenbank-Spektrum, 2013, 13(2): 89–99

    Article  Google Scholar 

  54. Hartig O. SQUIN: a traversal based query execution system for the web of linked data. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2013, 1081–1084

    Google Scholar 

  55. Ladwig G, Tran T. SIHJoin: Querying remote and local linked data. In: Proceedings of the 8th Extended Semantic Web Conference. 2011, 139–153

    Google Scholar 

  56. Umbrich J, Hose K, Karnstedt M, Harth A, Polleres A. Comparing data summaries for processing live queries over linked data. World Wide Web, 2011, 14(5–6): 495–544

    Article  Google Scholar 

  57. Ladwig G, Tran T. Linked data query processing strategies. In: Proceedings of the 9th International Semantic Web Conference. 2010, 453–469

    Google Scholar 

  58. Chaudhuri S, Narasayya V. Self-tuning database systems: a decade of progress. In: Proceedings of the 33rd International Conference on Very Large Data Bases. 2007, 3–14

    Google Scholar 

  59. Halim F, Idreos S, Karras P, Yap R H C. Stochastic database cracking: towards robust adaptive indexing main-memory column-stores. Proceedings of the VLDB Endowment, 2012, 5(6): 502–513

    Article  Google Scholar 

  60. Duan S, Kementsietsidis A, Srinivas K, Udrea O. Apples and oranges: a comparison of RDF benchmarks and real RDF datasets. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2011, 145–156

    Google Scholar 

  61. Kim J, Shin H, Han W S, Hong S, Chafi H. Taming subgraph isomorphism for RDF query processing. Proceedings of the VLDB Endowment, 2015, 8(11): 1238–1249

    Article  Google Scholar 

  62. Aluç G, Hartig O, Özsu M T, Daudjee K. Diversified stress testing of RDF data management systems. In: Proceedings of the 13th International Semantic Web Conference. 2014, 197–212

    Google Scholar 

  63. Aluç G, Özsu MT, Daudjee K. Workload matters: why RDF databases need a new design. Proceedings of the VLDB Endowment, 2014, 7(10): 837–840

    Article  Google Scholar 

  64. Aluç G, Özsu M T, Daudjee K, Hartig O. Executing queries over schemaless RDF databases. In: Proceedings of the 31st International Conference on Data Engineering. 2015, 807–818

    Google Scholar 

  65. Aluç G, Özsu M T, Daudjee K. Clustering RDF databases using Tunable-LSH. Eprint Arxiv, 2015

    Google Scholar 

  66. Indyk P, Motwani R. Approximate nearest neighbors: towards removingthe curse of dimensionality. In: Proceedings of the 30th Annual ACM Symposium on Theory of Computing. 1998, 604–613

    Google Scholar 

  67. Gionis A, Indyk P, Motwani R. Similarity search in high dimensions via hashing. In: Proceedings of the 25th International Conference on Very Large Data Bases. 1999, 518–529

    Google Scholar 

  68. Idreos S, Kersten M L, Manegold S. Database cracking. In: Proceedings of the 3rd Biennial Conference on Innovative Data Systems Research. 2007, 68–78

    Google Scholar 

  69. Idreos S, Kersten M L, Manegold S. Self-organizing tuple reconstruction in column-stores. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2009, 297–308

    Chapter  Google Scholar 

  70. Idreos S, Manegold S, Kuno H A, Graefe G. Merging what’s cracked, cracking what’s merged: Adaptive indexing in main-memory columnstores. Proceedings of the VLDB Endowment, 2011, 4(9): 585–597

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to M. Tamer Özsu.

Additional information

M. Tamer Özsu is a professor of computer science at the University of Waterloo, Canada. Dr. Özsu’s current research focuses on large scale data distribution, and management of unconventional data (e.g., graphs, RDF, XML, and streams). He is a fellow of ACM and IEEE, an elected member of the Science Academy of Turkey, and a member of Sigma Xi and AAAS.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Özsu, M.T. A survey of RDF data management systems. Front. Comput. Sci. 10, 418–432 (2016). https://doi.org/10.1007/s11704-016-5554-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11704-016-5554-y

Keywords

Navigation