Pay-as-you-go Approximate Join Top-k Processing for the Web of Data

  • Andreas Wagner
  • Veli Bicer
  • Thanh Tran
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8465)


For effectively searching the Web of data, ranking of results is a crucial. Top-k processing strategies have been proposed to allow an efficient processing of such ranked queries. Top-k strategies aim at computing k top-ranked results without complete result materialization. However, for many applications result computation time is much more important than result accuracy and completeness. Thus, there is a strong need for approximated ranked results. Unfortunately, previous work on approximate top-k processing is not well-suited for the Web of data. In this paper, we propose the first approximate top-k join framework for Web data and queries. Our approach is very lightweight – necessary statistics are learned at runtime in a pay-as-you-go manner. We conducted extensive experiments on state-of-art SPARQL benchmarks. Our results are very promising: we could achieve up to 65% time savings, while maintaining a high precision/recall.




Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Agrawal, R., Rantzau, R., Terzi, E.: Context-sensitive ranking. In: SIGMOD (2006)Google Scholar
  2. 2.
    Arai, B., Das, G., Gunopulos, D., Koudas, N.: Anytime measures for top-k algorithms. In: VLDB (2007)Google Scholar
  3. 3.
    Arai, B., Das, G., Gunopulos, D., Koudas, N.: Anytime measures for top-k algorithms on exact and fuzzy data sets. VLDB Journal (2009)Google Scholar
  4. 4.
    Chaudhuri, S., Das, G., Hristidis, V., Weikum, G.: Probabilistic information retrieval approach for ranking of database query results. In: TODS (2006)Google Scholar
  5. 5.
    Finger, J., Polyzotis, N.: Robust and efficient algorithms for rank join evaluation. In: SIGMOD (2009)Google Scholar
  6. 6.
    Hoff, P.D.: A First Course in Bayesian Statistical Methods. Springer (2009)Google Scholar
  7. 7.
    Ilyas, I.F., Aref, W.G., Elmagarmid, A.K.: Supporting top-k join queries in relational databases. VLDB Journal (2004)Google Scholar
  8. 8.
    Ilyas, I.F., Beskales, G., Soliman, M.A.: A survey of top-k query processing techniques in relational database systems. ACM Comput. Surv. (2008)Google Scholar
  9. 9.
    Magliacane, S., Bozzon, A., Della Valle, E.: Efficient execution of top-K SPARQL queries. In: Cudré-Mauroux, P., Heflin, J., Sirin, E., Tudorache, T., Euzenat, J., Hauswirth, M., Parreira, J.X., Hendler, J., Schreiber, G., Bernstein, A., Blomqvist, E. (eds.) ISWC 2012, Part I. LNCS, vol. 7649, pp. 344–360. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  10. 10.
    Mamoulis, N., Yiu, M.L., Cheng, K.H., Cheung, D.W.: Efficient top-k aggregation of ranked inputs. In: TODS (2007)Google Scholar
  11. 11.
    Martinenghi, D., Tagliasacchi, M.: Cost-Aware Rank Join with Random and Sorted Access. In: TKDE (2012)Google Scholar
  12. 12.
    Michel, S., Triantafillou, P., Weikum, G.: KLEE: a framework for distributed top-k query algorithms. In: VLDB (2005)Google Scholar
  13. 13.
    Morsey, M., Lehmann, J., Auer, S., Ngonga Ngomo, A.-C.: DBpedia SPARQL Benchmark – Performance Assessment with Real Queries on Real Data. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 454–469. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  14. 14.
    Neumann, T., Moerkotte, G.: Characteristic sets: Accurate cardinality estimation for RDF queries with multiple joins. In: ICDE (2011)Google Scholar
  15. 15.
    Neumann, T., Weikum, G.: Scalable join processing on very large RDF graphs. In: SIGMOD (2009)Google Scholar
  16. 16.
    Schmidt, M., Hornung, T., Lausen, G., Pinkel, C.: SP2Bench: A SPARQL Performance Benchmark. In: ICDE (2009)Google Scholar
  17. 17.
    Schnaitter, K., Polyzotis, N.: Evaluating rank joins with optimal cost. In: PODS (2008)Google Scholar
  18. 18.
    Shmueli-Scheuer, M., Li, C., Mass, Y., Roitman, H., Schenkel, R., Weikum, G.: Best-Effort Top-k Query Processing Under Budgetary Constraints. In: ICDE (2009)Google Scholar
  19. 19.
    Telang, A., Li, C., Chakravarthy, S.: One Size Does Not Fit All: Toward User- and Query-Dependent Ranking for Web Databases. In: TKDE (2012)Google Scholar
  20. 20.
    Theobald, M., Weikum, G., Schenkel, R.: Top-k query evaluation with probabilistic guarantees. In: VLDB (2004)Google Scholar
  21. 21.
    Wagner, A., Bicer, V., Tran, D.T.: Pay-as-you-go Approximate Join Top-k Processing for the Web of Data (2013),
  22. 22.
    Wagner, A., Duc, T.T., Ladwig, G., Harth, A., Studer, R.: Top-k linked data query processing. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 56–71. Springer, Heidelberg (2012)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Andreas Wagner
    • 1
  • Veli Bicer
    • 2
  • Thanh Tran
    • 1
  1. 1.Karlsruhe Institute of TechnologyGermany
  2. 2.IBM Research CentreDublinIreland

Personalised recommendations