International Semantic Web Conference

The Semantic Web - ISWC 2015 pp 52-69 | Cite as

FEASIBLE: A Feature-Based SPARQL Benchmark Generation Framework

  • Muhammad Saleem
  • Qaiser Mehmood
  • Axel-Cyrille Ngonga Ngomo
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9366)

Abstract

Benchmarking is indispensable when aiming to assess technologies with respect to their suitability for given tasks. While several benchmarks and benchmark generation frameworks have been developed to evaluate triple stores, they mostly provide a one-fits-all solution to the benchmarking problem. This approach to benchmarking is however unsuitable to evaluate the performance of a triple store for a given application with particular requirements. We address this drawback by presenting FEASIBLE, an automatic approach for the generation of benchmarks out of the query history of applications, i.e., query logs. The generation is achieved by selecting prototypical queries of a user-defined size from the input set of queries. We evaluate our approach on two query logs and show that the benchmarks it generates are accurate approximations of the input query logs. Moreover, we compare four different triple stores with benchmarks generated using our approach and show that they behave differently based on the data they contain and the types of queries posed. Our results suggest that FEASIBLE generates better sample queries than the state of the art. In addition, the better query selection and the larger set of query types used lead to triple store rankings which partly differ from the rankings generated by previous works.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Acosta, M., Vidal, M.-E., Lampo, T., Castillo, J., Ruckhaus, E.: ANAPSID: an adaptive query processing engine for SPARQL endpoints. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 18–34. Springer, Heidelberg (2011) CrossRefGoogle Scholar
  2. 2.
    Aluç, G., Hartig, O., Özsu, M.T., Daudjee, K.: Diversified stress testing of RDF data management systems. In: Mika, P., et al. (eds.) ISWC 2014, Part I. LNCS, vol. 8796, pp. 197–212. Springer, Heidelberg (2014) Google Scholar
  3. 3.
    Arias, M., Fernández, J.D., Martínez-Prieto, M.A., de la Fuente, P.: An empirical study of real-world SPARQL queries. CoRR (2011)Google Scholar
  4. 4.
    Bizer, C., Schultz, A: The berlin SPARQL benchmark. IJSWIS (2009)Google Scholar
  5. 5.
    Duan, S., Kementsietsidis, A., Srinivas, K., Udrea, O: Apples and oranges: a comparison of RDF benchmarks and real RDF datasets. In: SIGMOD (2011)Google Scholar
  6. 6.
    Görlitz, O., Thimm, M., Staab, S.: SPLODGE: systematic generation of SPARQL benchmark queries for linked open data. In: Cudré-Mauroux, P., et al. (eds.) ISWC 2012, Part I. LNCS, vol. 7649, pp. 116–132. Springer, Heidelberg (2012) CrossRefGoogle Scholar
  7. 7.
    Guo, Y., Heflin, J.: LUBM: A benchmark for OWL knowledge base systems. JWS (2005)Google Scholar
  8. 8.
    Kamdar, M., Iqbal, A., Saleem, M., Deus, H., Decker, S.: Genomesnip: fragmenting the genomic wheel to augment discovery in cancer research. In: CSHALS (2014)Google Scholar
  9. 9.
    Morsey, M., Lehmann, J., Auer, S., Ngonga Ngomo, A.-C.: DBpedia SPARQL benchmark – performance assessment with real queries on real data. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 454–469. Springer, Heidelberg (2011) CrossRefGoogle Scholar
  10. 10.
    Ngonga Ngomo, A.-C., Auer, S.: LIMES - a time-efficient approach for large-scale link discovery on the web of data. In: IJCAI (2011)Google Scholar
  11. 11.
    Picalausa, F., Vansummeren, S.: What are real SPARQL queries like? In: SWIM (2011)Google Scholar
  12. 12.
    Saleem, M., Ali, I., Hogan, A., Mehmood, Q., Ngonga Ngomo, A.-C.: LSQ: the linked SPARQL queries dataset. In: ISWC (2015)Google Scholar
  13. 13.
    Saleem, M., Kamdar, M.R., Iqbal, A., Sampath, S., Deus, H.F., Ngonga Ngomo, A.-C.: Big linked cancer data: Integrating linked TCGA and pubmed. JWS (2014)Google Scholar
  14. 14.
    Saleem, M., Ngonga Ngomo, A.-C.: HiBISCuS: hypergraph-based source selection for SPARQL endpoint federation. In: Presutti, V., d’Amato, C., Gandon, F., d’Aquin, M., Staab, S., Tordai, A. (eds.) ESWC 2014. LNCS, vol. 8465, pp. 176–191. Springer, Heidelberg (2014) CrossRefGoogle Scholar
  15. 15.
    Saleem, M., Ngonga Ngomo, A.-C., Xavier Parreira, J., Deus, H.F., Hauswirth, M.: DAW: duplicate-aware federated query processing over the web of data. In: Alani, H., et al. (eds.) ISWC 2013, Part I. LNCS, vol. 8218, pp. 574–590. Springer, Heidelberg (2013) CrossRefGoogle Scholar
  16. 16.
    Schmidt, M., Görlitz, O., Haase, P., Ladwig, G., Schwarte, A., Tran, T.: FedBench: a benchmark suite for federated semantic data query processing. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 585–600. Springer, Heidelberg (2011) CrossRefGoogle Scholar
  17. 17.
    Schmidt, M., Hornung, T., Lausen, G., Pinkel, C: Sp2bench: a SPARQL performance benchmark. In: ICDE (2009)Google Scholar
  18. 18.
    Schwarte, A., Haase, P., Hose, K., Schenkel, R., Schmidt, M.: FedX: optimization techniques for federated query processing on linked data. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 601–616. Springer, Heidelberg (2011) CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Muhammad Saleem
    • 1
  • Qaiser Mehmood
    • 2
  • Axel-Cyrille Ngonga Ngomo
    • 1
  1. 1.Universität Leipzig, IFI/AKSWLeipzigGermany
  2. 2.Insight Center for Data AnalyticsNational University of IrelandGalwayIreland

Personalised recommendations