Abstract
Between uri dereferencing and the sparql protocol lies a largely unexplored axis of possible interfaces to Linked Data, each with its own combination of trade-offs. One of these interfaces is Triple Pattern Fragments, which allows clients to execute sparql queries against low-cost servers, at the cost of higher bandwidth. Increasing a client’s efficiency means lowering the number of requests, which can among others be achieved through additional metadata in responses. We noted that typical sparql query evaluations against Triple Pattern Fragments require a significant portion of membership subqueries, which check the presence of a specific triple, rather than a variable pattern. This paper studies the impact of providing approximate membership functions, i.e., Bloom filters and Golomb-coded sets, as extra metadata. In addition to reducing http requests, such functions allow to achieve full result recall earlier when temporarily allowing lower precision. Half of the tested queries from a WatDiv benchmark test set could be executed with up to a third fewer http requests with only marginally higher server cost. Query times, however, did not improve, likely due to slower metadata generation and transfer. This indicates that approximate membership functions can partly improve the client-side query process with minimal impact on the server and its interface.
For Johan De Smedt. Thanks to Daniel P. Miranker for his suggestions on Bloom filters.
Chapter PDF
Similar content being viewed by others
References
Aluç, G., Hartig, O., Özsu, M.T., Daudjee, K.: Diversified stress testing of RDF data management systems. In: Mika, P., Tudorache, T., Bernstein, A., Welty, C., Knoblock, C., Vrandečić, D., Groth, P., Noy, N., Janowicz, K., Goble, C. (eds.) ISWC 2014, Part I. LNCS, vol. 8796, pp. 197–212. Springer, Heidelberg (2014)
Basca, C., Bernstein, A.: Avalanche: putting the spirit of the Web back into semantic web querying. In: Scalable Semantic Web Knowledge Base Systems, pp. 64–79 (2010)
Bloom, B.H.: Space/time trade-offs in hash coding with allowable errors. Communications of the ACM 13(7), 422–426 (1970)
Buil-Aranda, C., Hogan, A., Umbrich, J., Vandenbussche, P.-Y.: SPARQL web-querying infrastructure: ready for action? In: Alani, H., Kagal, L., Fokoue, A., Groth, P., Biemann, C., Parreira, J.X., Aroyo, L., Noy, N., Welty, C., Janowicz, K. (eds.) ISWC 2013, Part II. LNCS, vol. 8219, pp. 277–293. Springer, Heidelberg (2013)
Ermilov, I., Martin, M., Lehmann, J., Auer, S.: Linked open data statistics: collection and exploitation. In: Klinov, P., Mouromtsev, D. (eds.) KESW 2013. CCIS, vol. 394, pp. 242–249. Springer, Heidelberg (2013)
Feigenbaum, L., Williams, G.T., Clark, K.G., Torres, E.: sparql 1.1. protocol. Recommendation, w3c, March 2013. http://www.w3.org/TR/sparql11-protocol/
Filali, I., Bongiovanni, F., Huet, F., Baude, F.: A survey of structured P2P systems for RDF data storage and retrieval. In: Hameurlain, A., Küng, J., Wagner, R. (eds.) Transactions on Large-Scale Data- and Knowledge-Centered Systems III. LNCS, vol. 6790, pp. 20–55. Springer, Heidelberg (2011)
Gallager, R., Van Voorhis, D.C.: Optimal source codes for geometrically distributed integer alphabets. Transactions on Information Theory 21(2), 228–230 (1975)
Graefe, G.: Query evaluation techniques for large databases. acm Computing Surveys 25(2), 73–169 (1993)
Harris, S., Seaborne, A.: sparql 1.1 query language. Recommendation, w3c, March 2013. http://www.w3.org/TR/sparql11-query/
Heine, F.: Scalable p2p based RDF querying. In: Proceedings of the 1st International Conference on Scalable Information Systems (2006)
Hose, K., Schenkel, R.: Towards benefit-based rdf source selection for sparql queries. In: Proc. of the 4th International Workshop on Semantic Web Information Management, pp. 1–8 (2012)
Huang, H., Liu, C.: Estimating selectivity for joined rdf triple patterns. In: Conference on Information and Knowledge Management, pp. 1435–1444 (2011)
Li, J., Vuong, S.: Ontsum: a semantic query routing scheme in p2p networks based on concise ontology indexing. In: Advanced Information Networking and Applications, May 2007
Mitzenmacher, M.: Compressed Bloom filters. Transactions on Networking 10(5) (2002)
Neumann, T., Weikum, G.: Scalable join processing on very large rdf graphs. In: Proceedings of the International Conference on Management of Data, pp. 627–640. ACM (2009)
Oren, E., Guéret, C., Schlobach, S.: Anytime query answering in RDF through evolutionary algorithms. In: Sheth, A.P., Staab, S., Dean, M., Paolucci, M., Maynard, D., Finin, T., Thirunarayan, K. (eds.) ISWC 2008. LNCS, vol. 5318, pp. 98–113. Springer, Heidelberg (2008)
Pu, X., Wang, J., Luo, P., Wang, M.: Aweto: efficient incremental update and querying in rdf storage system. In: Proceedings of the 20th International Conference on Information and Knowledge Management, pp. 2445–2448. ACM (2011)
Putze, F., Sanders, P., Singler, J.: Cache-, hash-, and space-efficient Bloom filters. Journal of Experimental Algorithmics 14(4) (2009)
Ravindra, P., Hong, S., Kim, H., Anyanwu, K.: Efficient processing of rdf graph pattern matching on MapReduce platforms. In: Proceedings of the 2nd International Workshop on Data Intensive Computing in the Clouds, pp. 13–20 (2011)
Rietveld, L., Verborgh, R., Beek, W., Vander Sande, M., Schlobach, S.: Linked data-as-a-service: the semantic web redeployed. In: 12th Extended Semantic Web Conference (2015)
Schmachtenberg, M., Bizer, C., Paulheim, H.: Adoption of the linked data best practices in different topical domains. In: Mika, P., Tudorache, T., Bernstein, A., Welty, C., Knoblock, C., Vrandečić, D., Groth, P., Noy, N., Janowicz, K., Goble, C. (eds.) ISWC 2014, Part I. LNCS, vol. 8796, pp. 245–260. Springer, Heidelberg (2014)
Van Herwegen, J., Verborgh, R., Mannens, E., Van de Walle, R.: Query execution optimization for clients of triple pattern fragments. In: Extended Semantic Web Conference, June 2015
Verborgh, R.: Triple Pattern Fragments. Unofficial draft, Hydra w3c Community Group. http://www.hydra-cg.com/spec/latest/triple-pattern-fragments/
Verborgh, R., et al.: Querying datasets on the web with high availability. In: Mika, P., Tudorache, T., Bernstein, A., Welty, C., Knoblock, C., Vrandečić, D., Groth, P., Noy, N., Janowicz, K., Goble, C. (eds.) ISWC 2014, Part I. LNCS, vol. 8796, pp. 180–196. Springer, Heidelberg (2014)
Verborgh, R., Mannens, E., Van de Walle, R.: Initial usage analysis of DBpedia’s triple pattern fragments. In: Proc. of the 5th Workshop on Usage Analysis and the Web of Data (2015)
Zhang, X., Chen, L., Wang, M.: Towards efficient join processing over large RDF graph using MapReduce. In: Ailamaki, A., Bowers, S. (eds.) SSDBM 2012. LNCS, vol. 7338, pp. 250–259. Springer, Heidelberg (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Vander Sande, M., Verborgh, R., Van Herwegen, J., Mannens, E., Van de Walle, R. (2015). Opportunistic Linked Data Querying Through Approximate Membership Metadata. In: Arenas, M., et al. The Semantic Web - ISWC 2015. ISWC 2015. Lecture Notes in Computer Science(), vol 9366. Springer, Cham. https://doi.org/10.1007/978-3-319-25007-6_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-25007-6_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25006-9
Online ISBN: 978-3-319-25007-6
eBook Packages: Computer ScienceComputer Science (R0)