NoSQL Databases for RDF: An Empirical Evaluation

  • Philippe Cudré-Mauroux
  • Iliya Enchev
  • Sever Fundatureanu
  • Paul Groth
  • Albert Haque
  • Andreas Harth
  • Felix Leif Keppmann
  • Daniel Miranker
  • Juan F. Sequeda
  • Marcin Wylot
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8219)

Abstract

Processing large volumes of RDF data requires sophisticated tools. In recent years, much effort was spent on optimizing native RDF stores and on repurposing relational query engines for large-scale RDF processing. Concurrently, a number of new data management systems—regrouped under the NoSQL (for “not only SQL”) umbrella—rapidly rose to prominence and represent today a popular alternative to classical databases. Though NoSQL systems are increasingly used to manage RDF data, it is still difficult to grasp their key advantages and drawbacks in this context. This work is, to the best of our knowledge, the first systematic attempt at characterizing and comparing NoSQL stores for RDF processing. In the following, we describe four different NoSQL stores and compare their key characteristics when running standard RDF benchmarks on a popular cloud infrastructure using both single-machine and distributed deployments.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Abadi, D.J., Marcus, A., Madden, S.R., Hollenbach, K.: Scalable semantic web data management using vertical partitioning. In: Proceedings of the 33rd International Conference on Very Large Data Bases, VLDB 2007, pp. 411–422 (2007)Google Scholar
  2. 2.
    Bizer, C., Schultz, A.: The berlin sparql benchmark. International Journal on Semantic Web and Information Systems (IJSWIS) 5(2), 1–24 (2009)CrossRefGoogle Scholar
  3. 3.
    Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: A distributed storage system for structured data. ACM Trans. Comput. Syst. 26(2), 4:1–4:26 (2008)Google Scholar
  4. 4.
    Fundatureanu, S.: A Scalable RDF Store Based on HBase. Master’s thesis, Vrije University (2012), http://archive.org/details/ScalableRDFStoreOverHBase
  5. 5.
    Gueret, C., Kotoulas, S., Groth, P.: Triplecloud: An infrastructure for exploratory querying over web-scale RDF data. WI-IAT (2011)Google Scholar
  6. 6.
    Guo, Y., Pan, Z., Heflin, J.: Lubm: A benchmark for owl knowledge base systems. Web Semantics: Science, Services and Agents on the World Wide Web (2005)Google Scholar
  7. 7.
    Harris, S., Lamb, N., Shadbolt, N.: 4store: The design and implementation of a clustered rdf store. In: 5th International Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS 2009), pp. 94–109 (2009)Google Scholar
  8. 8.
    Harth, A., Decker, S.: Optimized Index Structures for Querying RDF from the Web. In: IEEE LA-WEB, pp. 71–80 (2005)Google Scholar
  9. 9.
    Ladwig, G., Harth, A.: CumulusRDF: Linked data management on nested key-value stores. In: The 7th International Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS 2011), p. 30 (2011)Google Scholar
  10. 10.
    Lakshman, A., Malik, P.: Cassandra: a decentralized structured storage system. SIGOPS Oper. Syst. Rev. 44(2), 35–40 (2010)CrossRefGoogle Scholar
  11. 11.
    Morsey, M., Lehmann, J., Auer, S., Ngomo, A.-C.N.: DBpedia SPARQL benchmark – performance assessment with real queries on real data. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 454–469. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  12. 12.
    O’Malley, O.: Terabyte sort on apache hadoop (2008), http://sortbenchmark.org/YahooHadoop.pdf
  13. 13.
    Papailiou, N., Konstantinou, I., Tsoumakos, D., Koziris, N.: H2rdf: adaptive query processing on rdf data in the cloud. In: WWW (Companion Volume)Google Scholar
  14. 14.
    Pokorny, J.: Nosql databases: a step to database scalability in web environment. In: Proceedings of the 13th International Conference on Information Integration and Web-based Applications and Services, iiWAS 2011, pp. 278–283. ACM, New York (2011)Google Scholar
  15. 15.
    Przyjaciel-Zablocki, M., Schätzle, A., Hornung, T., Dorner, C., Lausen, G.: Cascading map-side joins over hbase for scalable join processing. CoRR (2012)Google Scholar
  16. 16.
    Schmidt, M., Hornung, T., Küchlin, N., Lausen, G., Pinkel, C.: An experimental comparison of rdf data management approaches in a SPARQL benchmark scenario. In: Sheth, A.P., Staab, S., Dean, M., Paolucci, M., Maynard, D., Finin, T., Thirunarayan, K. (eds.) ISWC 2008. LNCS, vol. 5318, pp. 82–97. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  17. 17.
    Sidirourgos, L., Goncalves, R., Kersten, M., Nes, N., Manegold, S.: Column-store support for rdf data management: not all swans are white. Proc. of the VLDB Endow. 1(2), 1553–1563 (2008)Google Scholar
  18. 18.
    Sun, J.: Scalable rdf store based on hbase and mapreduce. In: 2010 3rd International Conference Advanced Computer Theory and Engineering, ICACTE (2010)Google Scholar
  19. 19.
    Tsialiamanis, P., Sidirourgos, L., Fundulaki, I., Christophides, V., Boncz, P.: Heuristics-based query optimisation for SPARQL. In: Proceedings of the 15th International Conference on Extending Database TechnologyGoogle Scholar
  20. 20.
    Urbani, J., Kotoulas, S., Maassen, J., Drost, N., Seinstra, F., Harmelen, F.V., Bal, H.: Webpie: A web-scale parallel inference engine. In: Third IEEE International Scalable Computing Challenge (SCALE2010), held in Conjunction with the 10th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGrid (2010)Google Scholar
  21. 21.
    Urbani, J., van Harmelen, F., Schlobach, S., Bal, H.: Querypie: Backward reasoning for owl horst over very large knowledge bases. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 730–745. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  22. 22.
    Khadilkar, V., Murat Kantarcioglu, B.T., Castagna, P.: Jena-hbase: A distributed, scalable and efficient rdf triple store. In: Proceedings of the ISWC 2012 Posters & Demonstrations Track (2012)Google Scholar
  23. 23.
    Weiss, C., Karras, P., Bernstein, A.: Hexastore: sextuple indexing for semantic web data management. Proc. of the VLDB Endow. (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Philippe Cudré-Mauroux
    • 1
  • Iliya Enchev
    • 1
  • Sever Fundatureanu
    • 2
  • Paul Groth
    • 2
  • Albert Haque
    • 3
  • Andreas Harth
    • 4
  • Felix Leif Keppmann
    • 4
  • Daniel Miranker
    • 3
  • Juan F. Sequeda
    • 3
  • Marcin Wylot
    • 1
  1. 1.University of FribourgSwitzerland
  2. 2.VU University AmsterdamThe Netherlands
  3. 3.University of Texas at AustinUSA
  4. 4.Karlsruhe Institute of TechnologyGermany

Personalised recommendations