Diversified Stress Testing of RDF Data Management Systems

  • Güneş Aluç
  • Olaf Hartig
  • M. Tamer Özsu
  • Khuzaima Daudjee
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8796)

Abstract

The Resource Description Framework (RDF) is a standard for conceptually describing data on the Web, and SPARQL is the query language for RDF. As RDF data continue to be published across heterogeneous domains and integrated at Web-scale such as in the Linked Open Data (LOD) cloud, RDF data management systems are being exposed to queries that are far more diverse and workloads that are far more varied. The first contribution of our work is an in-depth experimental analysis that shows existing SPARQL benchmarks are not suitable for testing systems for diverse queries and varied workloads. To address these shortcomings, our second contribution is the Waterloo SPARQL Diversity Test Suite (WatDiv) that provides stress testing tools for RDF data management systems. Using WatDiv, we have been able to reveal issues with existing systems that went unnoticed in evaluations using earlier benchmarks. Specifically, our experiments with five popular RDF data management systems show that they cannot deliver good performance uniformly across workloads. For some queries, there can be as much as five orders of magnitude difference between the query execution time of the fastest and the slowest system while the fastest system on one query may unexpectedly time out on another query. By performing a detailed analysis, we pinpoint these problems to specific types of queries and workloads.

Keywords

RDF SPARQL systems benchmarking workload diversity 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Abadi, D.J., Marcus, A., Madden, S.R., Hollenbach, K.: SW-Store: a vertically partitioned DBMS for semantic web data management. VLDB J. 18, 385–406 (2009)CrossRefGoogle Scholar
  2. 2.
    Aluç, G., Özsu, M.T., Daudjee, K.: Workload matters: Why RDF databases need a new design. Proc. VLDB 7(10), 837–840 (2014)Google Scholar
  3. 3.
    Arenas, M., Gutierrez, C., Pérez, J.: On the semantics of SPARQL. In: Semantic Web Inf. Man., pp. 281–307 (2009)Google Scholar
  4. 4.
    Arias, M., Fernández, J.D., Martínez-Prieto, M.A., de la Fuente, P.: An empirical study of real-world SPARQL queries. CoRR abs/1103.5043 (2011)Google Scholar
  5. 5.
    Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.G.: DBpedia: A nucleus for a web of open data. In: Aberer, K., et al. (eds.) ASWC 2007 and ISWC 2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  6. 6.
    Bizer, C., Heath, T., Berners-Lee, T.: Linked data - the story so far. Int. J. Semantic Web Inf. Syst. 5(3), 1–22 (2009)CrossRefGoogle Scholar
  7. 7.
    Bizer, C., Schultz, A.: The Berlin SPARQL benchmark. Int. J. Semantic Web Inf. Syst. 5(2), 1–24 (2009)CrossRefGoogle Scholar
  8. 8.
    Broekstra, J., Kampman, A., van Harmelen, F.: Sesame: A generic architecture for storing and querying RDF and RDF Schema. In: Proc. 1st Int. Semantic Web Conference, pp. 54–68 (2002)Google Scholar
  9. 9.
    Carroll, J.J., Dickinson, I., Dollin, C., Reynolds, D., Seaborne, A., Wilkinson, K.: Jena: implementing the semantic web recommendations. In: Proc. 13th Int. World Wide Web Conf. - Alternate Track Papers & Posters, pp. 74–83 (2004)Google Scholar
  10. 10.
    Duan, S., Kementsietsidis, A., Srinivas, K., Udrea, O.: Apples and oranges: a comparison of RDF benchmarks and real RDF datasets. In: SIGMOD Conference, pp. 145–156 (2011)Google Scholar
  11. 11.
    Erling, O.: Virtuoso, a hybrid RDBMS/graph column store. IEEE Data Eng. Bull. 35(1), 3–8 (2012)Google Scholar
  12. 12.
    Guo, Y., Pan, Z., Heflin, J.: LUBM: A benchmark for OWL knowledge base systems. J. Web Semantics 3(2-3), 158–182 (2005)CrossRefGoogle Scholar
  13. 13.
    Harris, S., Lamb, N., Shadbolt, N.: 4store: The design and implementation of a clustered RDF store. In: Proc. 5th Int. Workshop on Scalable Semantic Web Knowledge Base Systems, pp. 81–96 (2009)Google Scholar
  14. 14.
    Harris, S., Seaborne, A., Prud’hommeaux, E.: SPARQL 1.1 query language. W3C Recommendation (March 2013)Google Scholar
  15. 15.
    Idreos, S., Groffen, F., Nes, N., Manegold, S., Mullender, K.S., Kersten, M.L.: MonetDB: Two decades of research in column-oriented database architectures. IEEE Data Eng. Bull. 35(1), 40–45 (2012)Google Scholar
  16. 16.
    Kirchberg, M., Ko, R.K.L., Lee, B.S.: From linked data to relevant data – time is the essence. CoRR abs/1103.5046 (2011)Google Scholar
  17. 17.
    Morsey, M., Lehmann, J., Auer, S., Ngonga Ngomo, A.-C.: DBpedia SPARQL benchmark – performance assessment with real queries on real data. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 454–469. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  18. 18.
    Neumann, T., Weikum, G.: Scalable join processing on very large RDF graphs. In: Proc. ACM SIGMOD Int. Conf. on Management of Data, pp. 627–640 (2009)Google Scholar
  19. 19.
    Neumann, T., Weikum, G.: The RDF-3X engine for scalable management of RDF data. VLDB J. 19(1), 91–113 (2010)CrossRefGoogle Scholar
  20. 20.
    Schmidt, M., Hornung, T., Meier, M., Pinkel, C., Lausen, G.: Sp2bench: A sparql performance benchmark. In: Semantic Web Inf, pp. 371–393 (2009)Google Scholar
  21. 21.
    Schreiber, G., Raimond, Y.: RDF 1.1 primer. W3C Note (February 2014)Google Scholar
  22. 22.
    Sidirourgos, L., Goncalves, R., Kersten, M., Nes, N., Manegold, S.: Column-store support for RDF data management: not all swans are white. Proc. VLDB 1(2), 1553–1563 (2008)CrossRefGoogle Scholar
  23. 23.
    Stocker, M., Seaborne, A., Bernstein, A., Kiefer, C., Reynolds, D.: Sparql basic graph pattern optimization using selectivity estimation. In: Proc. 17th Int. World Wide Web Conf., pp. 595–604 (2008)Google Scholar
  24. 24.
    Zou, L., Mo, J., Zhao, D., Chen, L., Özsu, M.T.: gStore: Answering SPARQL queries via subgraph matching. Proc. VLDB 4(1), 482–493 (2011)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Güneş Aluç
    • 1
  • Olaf Hartig
    • 1
  • M. Tamer Özsu
    • 1
  • Khuzaima Daudjee
    • 1
  1. 1.David R. Cheriton School of Computer ScienceWaterlooCanada

Personalised recommendations