Enhancing Source Selection for Live Queries over Linked Data via Query Log Mining

  • Yuan Tian
  • Jürgen Umbrich
  • Yong Yu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7185)

Abstract

Traditionally, Linked Data query engines execute SPARQL queries over a materialised repository which on the one hand, guarantees fast query answering but on the other hand requires time and resource consuming preprocessing steps. In addition, the materialised repositories have to deal with the ongoing challenge of maintaining the index which is – given the size of the Web – practically unfeasible. Thus, the results for a given SPARQL query are potentially out-dated. Recent approaches address the result freshness problem by answering a given query directly over dereferenced query relevant Web documents. Our work investigate the problem of an efficient selection of query relevant sources under this context. As a part of query optimization, source selection tries to estimate the minimum number of sources accessed in order to answer a query. We propose to summarize and index sources based on frequently appearing query graph patterns mined from query logs. We verify the applicability of our approach and empirically show that our approach significantly reduces the number of relevant sources estimated while keeping the overhead low.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Berners-Lee, T.: Linked Data - Design Issues, http://www.w3.org/DesignIssues/LinkedData.html
  2. 2.
    Cyganiak, R., Harth, A., Hogan, A.: N-Quads: Extending N-Triples with Context (2009), http://sw.deri.org/2008/07/n-quads/
  3. 3.
    Deo, N., Micikevicius, P.: A new encoding for labeled trees employing a stack and a queue. Bulletin of the Institute of Combinatorics and its (2002)Google Scholar
  4. 4.
    Haase, P., Mathaß, T., Ziller, M.: An evaluation of approaches to federated query processing over linked data. In: Proceedings of the 6th International Conference on Semantic Systems, pp. 1–9. ACM (2010)Google Scholar
  5. 5.
    Harth, A., Hose, K., Karnstedt, M., Polleres, A., Sattler, K., Umbrich, J.: Data summaries for on-demand queries over linked data. In: Proceedings of the 19th International Conference on World Wide Web, pp. 411–420. ACM, New York (2010)CrossRefGoogle Scholar
  6. 6.
    Hartig, O.: Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution. In: Antoniou, G., Grobelnik, M., Simperl, E., Parsia, B., Plexousakis, D., De Leenheer, P., Pan, J. (eds.) ESWC 2011, Part I. LNCS, vol. 6643, pp. 154–169. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  7. 7.
    Hartig, O., Bizer, C., Freytag, J.: Executing SPARQL Queries over the Web of Linked Data. In: Bernstein, A., Karger, D.R., Heath, T., Feigenbaum, L., Maynard, D., Motta, E., Thirunarayan, K. (eds.) ISWC 2009. LNCS, vol. 5823, pp. 293–309. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  8. 8.
    Hartig, O., Huber, F.: A main memory index structure to query linked data. In: Proc. of the 4th Int. Linked Data on the Web (2011)Google Scholar
  9. 9.
    Isele, R., Umbrich, J., Bizer, C.: Ldspider: An open-source crawling framework for the web of linked data. In: Internaitional Semantic Web Conference 2010, pp. 6–9 (2010)Google Scholar
  10. 10.
    Ladwig, G., Tran, T.: Linked Data Query Processing Strategies. In: Patel-Schneider, P.F., Pan, Y., Hitzler, P., Mika, P., Zhang, L., Pan, J.Z., Horrocks, I., Glimm, B. (eds.) ISWC 2010, Part I. LNCS, vol. 6496, pp. 453–469. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  11. 11.
    Lubiw, A.: Some NP-complete problems similar to graph isomorphism. SIAM Journal on Computing (1981)Google Scholar
  12. 12.
    Manku, G., Motwani, R.: Approximate frequency counts over data streams. In: Conference on Very Large Data Bases (2002)Google Scholar
  13. 13.
    Manola, F., Miller, E.: RDF Primer, http://www.w3.org/TR/rdf-syntax/
  14. 14.
    Neville, E.: The codifying of tree-structure. Proceedings of Cambridge Philosophical, 381–385 (November 1953)Google Scholar
  15. 15.
    Ng, W., Dash, M.: Discovery of Frequent Patterns in Transactional Data Streams. Transaction on Large-Scale Data-and Knowledge-Centered Systems, 1–30 (2010)Google Scholar
  16. 16.
    Prud’hommeaux, E., Seaborne, A.: SPARQL Query Language for RDF, http://www.w3.org/TR/rdf-sparql-query/
  17. 17.
    Prüfer, H.: Neuer beweis eines satzes über permutationen. Archiv für Mathematik und Physik (1918)Google Scholar
  18. 18.
    Umbrich, J., Hausenblas, M., Hogan, A., Polleres, A., Decker, S.: Towards dataset dynamics: Change frequency of linked open data sources. In: 3rd International Workshop on Linked Data on the Web (LDOW 2010), in Conjunction with 19th International World Wide Web Conference, CEUR (2010)Google Scholar
  19. 19.
    Yan, X., Han, J.: gSpan: Graph-based substructure pattern mining. Order a Journal on the Theory of Ordered Sets and its Applications (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Yuan Tian
    • 1
  • Jürgen Umbrich
    • 2
  • Yong Yu
    • 1
  1. 1.Shanghai Jiao Tong UniversityShanghaiChina
  2. 2.Digital Enterprise Research InstituteNational University of IrelandGalwayIreland

Personalised recommendations