Opportunistic Linked Data Querying Through Approximate Membership Metadata

  • Miel Vander Sande
  • Ruben Verborgh
  • Joachim Van Herwegen
  • Erik Mannens
  • Rik Van de Walle
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9366)


Between uri dereferencing and the sparql protocol lies a largely unexplored axis of possible interfaces to Linked Data, each with its own combination of trade-offs. One of these interfaces is Triple Pattern Fragments, which allows clients to execute sparql queries against low-cost servers, at the cost of higher bandwidth. Increasing a client’s efficiency means lowering the number of requests, which can among others be achieved through additional metadata in responses. We noted that typical sparql query evaluations against Triple Pattern Fragments require a significant portion of membership subqueries, which check the presence of a specific triple, rather than a variable pattern. This paper studies the impact of providing approximate membership functions, i.e., Bloom filters and Golomb-coded sets, as extra metadata. In addition to reducing http requests, such functions allow to achieve full result recall earlier when temporarily allowing lower precision. Half of the tested queries from a WatDiv benchmark test set could be executed with up to a third fewer http requests with only marginally higher server cost. Query times, however, did not improve, likely due to slower metadata generation and transfer. This indicates that approximate membership functions can partly improve the client-side query process with minimal impact on the server and its interface.


Linked data Querying Availability Scalability sparql 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Aluç, G., Hartig, O., Özsu, M.T., Daudjee, K.: Diversified stress testing of RDF data management systems. In: Mika, P., Tudorache, T., Bernstein, A., Welty, C., Knoblock, C., Vrandečić, D., Groth, P., Noy, N., Janowicz, K., Goble, C. (eds.) ISWC 2014, Part I. LNCS, vol. 8796, pp. 197–212. Springer, Heidelberg (2014) Google Scholar
  2. 2.
    Basca, C., Bernstein, A.: Avalanche: putting the spirit of the Web back into semantic web querying. In: Scalable Semantic Web Knowledge Base Systems, pp. 64–79 (2010)Google Scholar
  3. 3.
    Bloom, B.H.: Space/time trade-offs in hash coding with allowable errors. Communications of the ACM 13(7), 422–426 (1970)CrossRefzbMATHGoogle Scholar
  4. 4.
    Buil-Aranda, C., Hogan, A., Umbrich, J., Vandenbussche, P.-Y.: SPARQL web-querying infrastructure: ready for action? In: Alani, H., Kagal, L., Fokoue, A., Groth, P., Biemann, C., Parreira, J.X., Aroyo, L., Noy, N., Welty, C., Janowicz, K. (eds.) ISWC 2013, Part II. LNCS, vol. 8219, pp. 277–293. Springer, Heidelberg (2013) CrossRefGoogle Scholar
  5. 5.
    Ermilov, I., Martin, M., Lehmann, J., Auer, S.: Linked open data statistics: collection and exploitation. In: Klinov, P., Mouromtsev, D. (eds.) KESW 2013. CCIS, vol. 394, pp. 242–249. Springer, Heidelberg (2013) CrossRefGoogle Scholar
  6. 6.
    Feigenbaum, L., Williams, G.T., Clark, K.G., Torres, E.: sparql 1.1. protocol. Recommendation, w3c, March 2013.
  7. 7.
    Filali, I., Bongiovanni, F., Huet, F., Baude, F.: A survey of structured P2P systems for RDF data storage and retrieval. In: Hameurlain, A., Küng, J., Wagner, R. (eds.) Transactions on Large-Scale Data- and Knowledge-Centered Systems III. LNCS, vol. 6790, pp. 20–55. Springer, Heidelberg (2011) CrossRefGoogle Scholar
  8. 8.
    Gallager, R., Van Voorhis, D.C.: Optimal source codes for geometrically distributed integer alphabets. Transactions on Information Theory 21(2), 228–230 (1975)CrossRefzbMATHGoogle Scholar
  9. 9.
    Graefe, G.: Query evaluation techniques for large databases. acm Computing Surveys 25(2), 73–169 (1993)Google Scholar
  10. 10.
    Harris, S., Seaborne, A.: sparql 1.1 query language. Recommendation, w3c, March 2013.
  11. 11.
    Heine, F.: Scalable p2p based RDF querying. In: Proceedings of the 1st International Conference on Scalable Information Systems (2006)Google Scholar
  12. 12.
    Hose, K., Schenkel, R.: Towards benefit-based rdf source selection for sparql queries. In: Proc. of the 4th International Workshop on Semantic Web Information Management, pp. 1–8 (2012)Google Scholar
  13. 13.
    Huang, H., Liu, C.: Estimating selectivity for joined rdf triple patterns. In: Conference on Information and Knowledge Management, pp. 1435–1444 (2011)Google Scholar
  14. 14.
    Li, J., Vuong, S.: Ontsum: a semantic query routing scheme in p2p networks based on concise ontology indexing. In: Advanced Information Networking and Applications, May 2007Google Scholar
  15. 15.
    Mitzenmacher, M.: Compressed Bloom filters. Transactions on Networking 10(5) (2002)Google Scholar
  16. 16.
    Neumann, T., Weikum, G.: Scalable join processing on very large rdf graphs. In: Proceedings of the International Conference on Management of Data, pp. 627–640. ACM (2009)Google Scholar
  17. 17.
    Oren, E., Guéret, C., Schlobach, S.: Anytime query answering in RDF through evolutionary algorithms. In: Sheth, A.P., Staab, S., Dean, M., Paolucci, M., Maynard, D., Finin, T., Thirunarayan, K. (eds.) ISWC 2008. LNCS, vol. 5318, pp. 98–113. Springer, Heidelberg (2008) CrossRefGoogle Scholar
  18. 18.
    Pu, X., Wang, J., Luo, P., Wang, M.: Aweto: efficient incremental update and querying in rdf storage system. In: Proceedings of the 20th International Conference on Information and Knowledge Management, pp. 2445–2448. ACM (2011)Google Scholar
  19. 19.
    Putze, F., Sanders, P., Singler, J.: Cache-, hash-, and space-efficient Bloom filters. Journal of Experimental Algorithmics 14(4) (2009)Google Scholar
  20. 20.
    Ravindra, P., Hong, S., Kim, H., Anyanwu, K.: Efficient processing of rdf graph pattern matching on MapReduce platforms. In: Proceedings of the 2nd International Workshop on Data Intensive Computing in the Clouds, pp. 13–20 (2011)Google Scholar
  21. 21.
    Rietveld, L., Verborgh, R., Beek, W., Vander Sande, M., Schlobach, S.: Linked data-as-a-service: the semantic web redeployed. In: 12th Extended Semantic Web Conference (2015)Google Scholar
  22. 22.
    Schmachtenberg, M., Bizer, C., Paulheim, H.: Adoption of the linked data best practices in different topical domains. In: Mika, P., Tudorache, T., Bernstein, A., Welty, C., Knoblock, C., Vrandečić, D., Groth, P., Noy, N., Janowicz, K., Goble, C. (eds.) ISWC 2014, Part I. LNCS, vol. 8796, pp. 245–260. Springer, Heidelberg (2014) Google Scholar
  23. 23.
    Van Herwegen, J., Verborgh, R., Mannens, E., Van de Walle, R.: Query execution optimization for clients of triple pattern fragments. In: Extended Semantic Web Conference, June 2015Google Scholar
  24. 24.
    Verborgh, R.: Triple Pattern Fragments. Unofficial draft, Hydra w3c Community Group.
  25. 25.
    Verborgh, R., et al.: Querying datasets on the web with high availability. In: Mika, P., Tudorache, T., Bernstein, A., Welty, C., Knoblock, C., Vrandečić, D., Groth, P., Noy, N., Janowicz, K., Goble, C. (eds.) ISWC 2014, Part I. LNCS, vol. 8796, pp. 180–196. Springer, Heidelberg (2014) Google Scholar
  26. 26.
    Verborgh, R., Mannens, E., Van de Walle, R.: Initial usage analysis of DBpedia’s triple pattern fragments. In: Proc. of the 5th Workshop on Usage Analysis and the Web of Data (2015)Google Scholar
  27. 27.
    Zhang, X., Chen, L., Wang, M.: Towards efficient join processing over large RDF graph using MapReduce. In: Ailamaki, A., Bowers, S. (eds.) SSDBM 2012. LNCS, vol. 7338, pp. 250–259. Springer, Heidelberg (2012) CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Miel Vander Sande
    • 1
  • Ruben Verborgh
    • 1
  • Joachim Van Herwegen
    • 1
  • Erik Mannens
    • 1
  • Rik Van de Walle
    • 1
  1. 1.Multimedia LabGhent University – iMindsLedeberg-GhentBelgium

Personalised recommendations