Efficient Execution of Top-K SPARQL Queries

  • Sara Magliacane
  • Alessandro Bozzon
  • Emanuele Della Valle
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7649)

Abstract

Top-k queries, i.e. queries returning the top k results ordered by a user-defined scoring function, are an important category of queries. Order is an important property of data that can be exploited to speed up query processing. State-of-the-art SPARQL engines underuse order, and top-k queries are mostly managed with a materialize-then-sort processing scheme that computes all the matching solutions (e.g. thousands) even if only a limited number k (e.g. ten) are requested. The \(\mathcal{S}\)PARQL-\(\mathcal{R}\)ANK algebra is an extended SPARQL algebra that treats order as a first class citizen, enabling efficient split-and-interleave processing schemes that can be adopted to improve the performance of top-k SPARQL queries. In this paper we propose an incremental execution model for \(\mathcal{S}\)PARQL-\(\mathcal{R}\)ANK queries, we compare the performance of alternative physical operators, and we propose a rank-aware join algorithm optimized for native RDF stores. Experiments conducted with an open source implementation of a \(\mathcal{S}\)PARQL-\(\mathcal{R}\)ANK query engine based on ARQ show that the evaluation of top-k queries can be sped up by orders of magnitude.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bozzon, A., et al.: Towards and efficient SPARQL top-k query execution in virtual RDF stores. In: DBRANK Workshop in VLDB 2011 (2011)Google Scholar
  2. 2.
    Wagner, A., Duc, T.T., Ladwig, G., Harth, A., Studer, R.: Top-k Linked Data Query Processing. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 56–71. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  3. 3.
    Bizer, C., Schultz, A.: The Berlin SPARQL Benchmark. Int. J. Semantic Web Inf. Syst. 5(2) (2009)Google Scholar
  4. 4.
    Li, C., et al.: RankSQL: query algebra and optimization for relational top-k queries. In: SIGMOD 2005. ACM (2005)Google Scholar
  5. 5.
    Castagna, P.: Avoid a total sort for order by + limit queries. JENA bug tracker, https://issues.apache.org/jira/browse/jena-89
  6. 6.
    Della Valle, E., et al.: Order matters! harnessing a world of orderings for reasoning over massive data. Semantic Web Journal (2012)Google Scholar
  7. 7.
    Hwang, S.-W., Chang, K.: Probe minimization by schedule optimization: Supporting top-k queries with expensive predicates. IEEE TKDE 19(5) (2007)Google Scholar
  8. 8.
    Ilyas, I.F., et al.: Rank-aware Query Optimization. In: SIGMOD 2004. ACM (2004)Google Scholar
  9. 9.
    Ilyas, I.F., et al.: A survey of top-k query processing techniques in relational database systems. ACM Comput. Surv. 40(4) (2008)Google Scholar
  10. 10.
    Cheng, J., Ma, Z.M., Yan, L.: f-SPARQL: A Flexible Extension of SPARQL. In: Bringas, P.G., Hameurlain, A., Quirchmayr, G. (eds.) DEXA 2010, Part I. LNCS, vol. 6261, pp. 487–494. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  11. 11.
    Pérez, J., et al.: Semantics and complexity of SPARQL. ACM Trans. Database Syst. 34(3) (2009)Google Scholar
  12. 12.
    Anyanwu, K., et al.: SemRank: ranking complex relationship search results on the semantic web. In: WWW 2005. ACM (2005)Google Scholar
  13. 13.
    Schmidt, M., et al.: Foundations of SPARQL query optimization. In: ICDT 2010. ACM (2010)Google Scholar
  14. 14.
    Stocker, M., et al.: SPARQL basic graph pattern optimization using selectivity estimation. In: WWW 2008. ACM (2008)Google Scholar
  15. 15.
    Martinenghi, D., Tagliasacchi, M.: Cost-Aware Rank Join with Random and Sorted Access. IEEE TKDE (2011)Google Scholar
  16. 16.
    Bruno, N., et al.: Evaluating Top-k Queries over Web-Accessible Databases. In: ICDE 2002. IEEE (2002)Google Scholar
  17. 17.
    Lopes, N., Polleres, A., Straccia, U., Zimmermann, A.: AnQL: SPARQLing Up Annotated RDFS. In: Patel-Schneider, P.F., Pan, Y., Hitzler, P., Mika, P., Zhang, L., Pan, J.Z., Horrocks, I., Glimm, B. (eds.) ISWC 2010, Part I. LNCS, vol. 6496, pp. 518–533. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  18. 18.
    Schnaitter, K., Polyzotis, N.: Optimal algorithms for evaluating rank joins in database systems. ACM Transactions on Database Systems 35(1) (2010)Google Scholar
  19. 19.
    Straccia, U.: SoftFacts: A top-k retrieval engine for ontology mediated access to relational databases. In: SMC 2010. IEEE (2010)Google Scholar
  20. 20.
    Siberski, W., Pan, J.Z., Thaden, U.: Querying the Semantic Web with Preferences. In: Cruz, I., Decker, S., Allemang, D., Preist, C., Schwabe, D., Mika, P., Uschold, M., Aroyo, L.M. (eds.) ISWC 2006. LNCS, vol. 4273, pp. 612–624. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  21. 21.
    Qi, Y., et al.: Sum-Max Monotonic Ranked Joins for Evaluating Top-K Twig Queries on Weighted Data Graphs. In: VLDB 2007 (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Sara Magliacane
    • 1
    • 2
  • Alessandro Bozzon
    • 1
  • Emanuele Della Valle
    • 1
  1. 1.Politecnico of MilanoMilanoItaly
  2. 2.VU University AmsterdamThe Netherlands

Personalised recommendations