Advertisement

A survey of RDF management technologies and benchmark datasets

  • Zhengyu Pan
  • Tao Zhu
  • Hong Liu
  • Huansheng Ning
Original Research

Abstract

With the fast development of semantic web and some other areas, the amount of resource description framework (RDF) data has increased significantly. How to efficiently manage these masses of RDF data has become a challenging task, and has attracted many scholars to research. This paper introduces the state-of-the-art of the RDF storage and query technologies according to some classification criteria. In addition, several prevailing benchmark datasets are introduced and compared. Finally, research challenges and opportunities in future are discussed.

Keywords

RDF RDF management RDF benchmark dataset Semantic web SPARQL 

Notes

Acknowledgements

This work is supported by the National Natural Science Foundation of China (Nos. 61471035, 61601129) and the double first class construct program of USC (No. 2017SYL16).

References

  1. Abadi DJ, Marcus A, Madden SR, Hollenbach K (2007) Scalable semantic web data management using vertical partitioning. VLDB 2007:411–422Google Scholar
  2. Abadi DJ, Marcus A, Madden SR, Hollenbach K (2009) Sw-store: a vertically partitioned DBMS for semantic web data management. VLDB J 18(2):385–406CrossRefGoogle Scholar
  3. Abadi DJ et al (2007) Column stores for wide and sparse data. CIDR 2007:292–297Google Scholar
  4. Beckmann JL, Halverson A, Krishnamurthy R, Naughton JF (2006) Extending RDBMSs to support sparse datasets using an interpreted attribute storage format. ICDE 2006:58–58Google Scholar
  5. Berners-Lee T, Hendler J, Lassila O et al (2001) The semantic web. Sci Am 284(5):28–37CrossRefGoogle Scholar
  6. Bizer C, Schultz A (2009) The Berlin SPARQL benchmark. Int J Semant Web Inf Syst 5(2):1–24CrossRefGoogle Scholar
  7. Broekstra J, Kampman A, Van Harmelen F (2002) Sesame: a generic architecture for storing and querying RDF and RDF schema. ISWC 2002:54–68zbMATHGoogle Scholar
  8. Carroll JJ, Dickinson I, Dollin C, Reynolds D, Seaborne A, Wilkinson K (2004) Jena: implementing the semantic web recommendations. WWW 2004:74–83Google Scholar
  9. Chang F, Dean J, Ghemawat S, Hsieh WC, Wallach DA, Burrows M, Chandra T, Fikes A, Gruber RE (2008) Bigtable: a distributed storage system for structured data. ACM Trans Comput Syst 26(2):4–26CrossRefGoogle Scholar
  10. Chawla T, Singh G, Pilli ES, Govil M (2016) Research issues in RDF management systems. ETCT 2016:1–5Google Scholar
  11. Chen Y, Ou J, Jiang Y, Meng X (2006) Hstar: a semantic repository for large scale OWL documents. ASWC 2006:415–428Google Scholar
  12. Cheng J, Ma Z, Tong Q (2018) RDF storage and querying: a literature review. Information retrieval and management: concepts, methodologies, tools, and applications, IGI Global, pp 415–433Google Scholar
  13. Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113CrossRefGoogle Scholar
  14. Duan S, Kementsietsidis A, Srinivas K, Udrea O (2011) Apples and oranges: a comparison of RDF benchmarks and real RDF datasets. SIGMOD 2011:145–156Google Scholar
  15. Erling O, Mikhailov I (2009) RDF support in the virtuoso DBMS. In: Networked knowledge—Networked media—Integrating knowledge management 2009, pp 7–24Google Scholar
  16. Galarraga L, Hose K, Schenkel R (2014) Partout: a distributed engine for efficient RDF processing. WWW 2014:267–268Google Scholar
  17. Goasdoué F, Kaoudi Z, Manolescu I, Quiané-Ruiz JA (2015) Cliquesquare: flat plans for massively parallel RDF queries. ICDE 2015:771–782Google Scholar
  18. Guo Y, Pan Z, Heflin J (2005) Lubm: a benchmark for OWL knowledge base systems. J Web Semant 3(2):158–182CrossRefGoogle Scholar
  19. Gurajada S, Seufert S, Miliaraki I, Theobald M (2014) Triad: a distributed shared-nothing RDF engine based on asynchronous message passing. SIGMOD 2014:289–300Google Scholar
  20. Hammoud M, Rabbou DA, Nouri R, Beheshti SMR, Sakr S (2015) Dream: distributed RDF engine with adaptive query planner and minimal communication. Proc VLDB Endow 8(6):654–665CrossRefGoogle Scholar
  21. Han J, Haihong E, Le G, Du J (2011) Survey on NoSql database. ICPCA 2011:363–366Google Scholar
  22. Harris S, Gibbins N (2003) 3store: efficient bulk RDF storage. PSSS 2003:1–15Google Scholar
  23. Harris S, Lamb N, Shadbolt N (2009) 4store: the design and implementation of a clustered RDF store. SSWS 2009:94–109Google Scholar
  24. Harth A, Decker S (2005) Optimized index structures for querying RDF from the web. LA-WEB 2005:10–19Google Scholar
  25. Heese R, Znamirowski M (2012) Resource centered RDF data management. In: SSWS 2011 workshop, pp 138–153Google Scholar
  26. Hertel A, Broekstra J, Stuckenschmidt H (2009) RDF storage and retrieval systems. In: Staab S, Studer R (eds) Handbook on ontologies. Springer, Berlin, Heidelberg, pp 489–508CrossRefGoogle Scholar
  27. Huang J, Abadi DJ, Ren K (2011) Scalable SPARQL querying of large RDF graphs. Proc VLDB Endow 4(11):1123–1134Google Scholar
  28. Huang J, Venkatraman K, Abadi DJ (2014) Query optimization of distributed pattern matching. ICDE 2014:64–75Google Scholar
  29. Husain M, McGlothlin J, Masud MM, Khan L, Thuraisingham BM (2011) Heuristics-based query processing for large RDF graphs using cloud computing. IEEE Trans Knowl Data Eng 23(9):1312–1327CrossRefGoogle Scholar
  30. Kiryakov A, Ognyanov D, Manov D (2005) Owlim: a pragmatic semantic repository for OWL. In: WISE 2005 workshops, pp 182–192CrossRefGoogle Scholar
  31. Ma L, Su Z, Pan Y, Zhang L, Liu T (2004) Rstar: an RDF storage and query system for enterprise resource management. CIKM 2004:484–491Google Scholar
  32. Ma L, Yang Y, Qiu Z, Xie G, Pan Y, Liu S (2006) Towards a complete OWL ontology benchmark. Semant Web 2006:125–139Google Scholar
  33. Ma Z, Yan L (2016) A review of RDF storage in nosql databases. In: Managing big data in cloud computing environments, IGI Global, pp 210–229Google Scholar
  34. McBride B (2002) Jena: a semantic web toolkit. IEEE Internet Comput 6(6):55–59CrossRefGoogle Scholar
  35. Membrey P, Plugge E, Hawkins T (2010) The definitive guide to MongoDB: the noSQL database for cloud and desktop computing. O'Reilly Media, Inc.Google Scholar
  36. Morsey M, Lehmann J, Auer S, Ngomo ACN (2011) Dbpedia SPARQL benchmark-performance assessment with real queries on real data. ISWC 2011:454–469Google Scholar
  37. Murray C, Alexander N, Das S, Eadon G, Ravada S (2005) Oracle spatial resource description framework (RDF). Oracle CorporationGoogle Scholar
  38. Neumann T, Weikum G (2010) The RDF-3X engine for scalable management of RDF data. VLDB J 19(1):91–113CrossRefGoogle Scholar
  39. Pan Z, Heflin J (2004) Dldb: extending relational databases to support semantic web queries. In: ISWC 2003 workshopGoogle Scholar
  40. Papailiou N, Tsoumakos D, Konstantinou I, Karras P, Koziris N (2014) H2RDF+: an efficient data management system for big RDF graphs. In: SIGMOD 2014, pp 909–912Google Scholar
  41. Prud E, Seaborne A, et al (2006) SPARQL query language for RDF. W3C working draftGoogle Scholar
  42. Rohloff K, Schantz RE (2010) High-performance, massively scalable distributed systems using the mapreduce software framework: the shard triple-store. SPLASH 2010:4–8Google Scholar
  43. Schmidt M, Hornung T, Lausen G, Pinkel C (2009) S\(P^2\)Bench: a SPARQL performance benchmark. ICDE 2009:222–233Google Scholar
  44. Sidirourgos L, Goncalves R, Kersten M, Nes N, Manegold S (2008) Column-store support for RDF data management: not all swans are white. Proc VLDB Endow 1(2):1553–1563CrossRefGoogle Scholar
  45. Sivasubramanian S (2012) Amazon dynamodb: a seamlessly scalable non-relational database service. SIGMOD 2012:729–730Google Scholar
  46. Webber J (2012) A programmatic introduction to neo4j. SPLASH 2012:217–218CrossRefGoogle Scholar
  47. Wood D, Gearon P, Adams T (2005) Kowari: a platform for semantic web storage and analysis. In: XTech 2005 conference, pp 05–0402Google Scholar
  48. Yan Y, Wang C, Zhou A, Qian W, Ma L, Pan Y (2009) Efficient indices using graph partitioning in RDF triple stores. ICDE 2009:1263–1266Google Scholar
  49. Zeng K, Yang J, Wang H, Shao B, Wang Z (2013) A distributed graph engine for web scale RDF data. Proc VLDB Endow 6(4):265–276CrossRefGoogle Scholar
  50. Zou L, Özsu MT (2017) Graph-based RDF data management. Data Sci Eng 2(1):56–70CrossRefGoogle Scholar
  51. Zou L, Mo J, Chen L, Özsu MT, Zhao D (2011) gStore: answering SPARQL queries via subgraph matching. Proc VLDB Endow 4(8):482–493CrossRefGoogle Scholar
  52. Zou L, Özsu MT, Chen L, Shen X, Huang R, Zhao D (2014) gStore: a graph-based SPARQL query engine. VLDB J 23(4):565–590CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.School of Computer and Communication EngineeringUniversity of Science & Technology BeijingBeijingChina
  2. 2.School of SoftwareUniversity of South ChinaHengyangChina
  3. 3.School of Computer Science and Software EngineeringEast China Normal UniversityShanghaiChina
  4. 4.Beijing Engineering Research Center for Cyberspace Data Analysis and ApplicationsBeijingChina

Personalised recommendations