Skip to main content

DBpedia SPARQL Benchmark – Performance Assessment with Real Queries on Real Data

  • Conference paper

Part of the Lecture Notes in Computer Science book series (LNISA,volume 7031)

Abstract

Triple stores are the backbone of increasingly many Data Web applications. It is thus evident that the performance of those stores is mission critical for individual projects as well as for data integration on the Data Web in general. Consequently, it is of central importance during the implementation of any of these applications to have a clear picture of the weaknesses and strengths of current triple store implementations. In this paper, we propose a generic SPARQL benchmark creation procedure, which we apply to the DBpedia knowledge base. Previous approaches often compared relational and triple stores and, thus, settled on measuring performance against a relational database which had been converted to RDF by using SQL-like queries. In contrast to those approaches, our benchmark is based on queries that were actually issued by humans and applications against existing RDF data not resembling a relational schema. Our generic procedure for benchmark creation is based on query-log mining, clustering and SPARQL feature analysis. We argue that a pure SPARQL benchmark is more useful to compare existing triple stores and provide results for the popular triple store implementations Virtuoso, Sesame, Jena-TDB, and BigOWLIM. The subsequent comparison of our results with other benchmark results indicates that the performance of triple stores is by far less homogeneous than suggested by previous benchmarks.

Keywords

  • Resource Description Framework
  • Dataset Size
  • SPARQL Query
  • Triple Pattern
  • Resource Description Framework Data

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This work was supported by a grant from the European Union’s 7th Framework Programme provided for the project LOD2 (GA no. 257943).

References

  1. Auer, S., Lehmann, J., Hellmann, S.: LinkedGeoData: Adding a Spatial Dimension to the Web of Data. In: Bernstein, A., Karger, D.R., Heath, T., Feigenbaum, L., Maynard, D., Motta, E., Thirunarayan, K. (eds.) ISWC 2009. LNCS, vol. 5823, pp. 731–746. Springer, Heidelberg (2009)

    CrossRef  Google Scholar 

  2. Belleau, F., Nolin, M.-A., Tourigny, N., Rigault, P., Morissette, J.: Bio2rdf: Towards a mashup to build bioinformatics knowledge systems. Journal of Biomedical Informatics 41(5), 706–716 (2008)

    CrossRef  Google Scholar 

  3. Bishop, B., Kiryakov, A., Ognyanoff, D., Peikov, I., Tashev, Z., Velkov, R.: Owlim: A family of scalable semantic repositories. Semantic Web 2(1), 1–10 (2011)

    CrossRef  Google Scholar 

  4. Bizer, C., Schultz, A.: The Berlin SPARQL Benchmark. Int. J. Semantic Web Inf. Syst. 5(2), 1–24 (2009)

    CrossRef  Google Scholar 

  5. Broekstra, J., Kampman, A., van Harmelen, F.: Sesame: A generic architecture for storing and querying RDF and RDF schema. In: Horrocks, I., Hendler, J. (eds.) ISWC 2002. LNCS, vol. 2342, pp. 54–68. Springer, Heidelberg (2002)

    CrossRef  Google Scholar 

  6. Duan, S., Kementsietsidis, A., Srinivas, K., Udrea, O.: Apples and oranges: a comparison of RDF benchmarks and real RDF datasets. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 145–156. ACM (2011)

    Google Scholar 

  7. Erling, O., Mikhailov, I.: RDF support in the virtuoso DBMS. In: Auer, S., Bizer, C., Müller, C., Zhdanova, A.V. (eds.) CSSW. LNI, vol. 113, pp. 59–68. GI (2007)

    Google Scholar 

  8. Gray, J. (ed.): The Benchmark Handbook for Database and Transaction Systems, 1st edn. Morgan Kaufmann (1991)

    Google Scholar 

  9. Klyne, G., Carroll, J.J.: Resource description framework (RDF): Concepts and abstract syntax. W3C Recommendation (February 2004)

    Google Scholar 

  10. Lehmann, J., Bizer, C., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hellmann, S.: DBpedia - a crystallization point for the web of data. Journal of Web Semantics 7(3), 154–165 (2009)

    CrossRef  Google Scholar 

  11. Minack, E., Siberski, W., Nejdl, W.: Benchmarking Fulltext Search Performance of RDF Stores. In: Aroyo, L., Traverso, P., Ciravegna, F., Cimiano, P., Heath, T., Hyvönen, E., Mizoguchi, R., Oren, E., Sabou, M., Simperl, E. (eds.) ESWC 2009. LNCS, vol. 5554, pp. 81–95. Springer, Heidelberg (2009)

    CrossRef  Google Scholar 

  12. Ngonga Ngomo, A.-C., Schumacher, F.: BorderFlow: A local graph clustering algorithm for natural language processing. In: Gelbukh, A. (ed.) CICLing 2009. LNCS, vol. 5449, pp. 547–558. Springer, Heidelberg (2009)

    CrossRef  Google Scholar 

  13. Ngonga Ngomo, A.-C., Auer, S.: Limes - a time-efficient approach for large-scale link discovery on the web of data. In: Proceedings of IJCAI (2011)

    Google Scholar 

  14. Owens, A., Gibbins, N., Schraefel, m.c.: Effective benchmarking for rdf stores using synthetic data (May 2008)

    Google Scholar 

  15. Owens, A., Seaborne, A., Gibbins, N., Schraefel, m.c.: Clustered TDB: A clustered triple store for jena. Technical report, Electronics and Computer Science, University of Southampton (2008)

    Google Scholar 

  16. Pan, Z., Guo, Y., Heflin, J.: LUBM: A benchmark for OWL knowledge base systems. Journal of Web Semantics 3, 158–182 (2005)

    CrossRef  Google Scholar 

  17. Prud’hommeaux, E., Seaborne, A.: SPARQL Query Language for RDF. W3C Recommendation (2008)

    Google Scholar 

  18. Schmidt, M., Hornung, T., Lausen, G., Pinkel, C.: SP2Bench: A SPARQL performance benchmark. In: ICDE, pp. 222–233. IEEE (2009)

    Google Scholar 

  19. Stickler, P.: CBD - concise bounded description (2005), http://www.w3.org/Submission/CBD/ (retrieved February 15, 2011)

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Morsey, M., Lehmann, J., Auer, S., Ngonga Ngomo, AC. (2011). DBpedia SPARQL Benchmark – Performance Assessment with Real Queries on Real Data. In: , et al. The Semantic Web – ISWC 2011. ISWC 2011. Lecture Notes in Computer Science, vol 7031. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25073-6_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-25073-6_29

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-25072-9

  • Online ISBN: 978-3-642-25073-6

  • eBook Packages: Computer ScienceComputer Science (R0)