Advertisement

Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

R3F: RDF triple filtering method for efficient SPARQL query processing

Abstract

With the rapid growth in the amount of graph-structured Resource Description Framework (RDF) data, SPARQL query processing has received significant attention. The most important part of SPARQL query processing is its method of subgraph pattern matching. For this, most RDF stores use relation-based approaches, which can produce a vast number of redundant intermediate results during query evaluation. In order to address this problem, we propose an RDF Triple Filtering (R3F) method that exploits the graph-structural information of RDF data. We design a path-based index called the RDF Path index (RP-index) to efficiently provide filter data for the triple filtering. We also propose a relational operator called the RDF Filter (RFLT) that can conduct the triple filtering with little overhead compared to the original query processing. Through comprehensive experiments on large-scale RDF datasets, we demonstrate that R3F can effectively and efficiently reduce the number of redundant intermediate results and improve the query performance.

This is a preview of subscription content, log in to check access.

References

  1. 1.

    Abadi, D.J., Marcus, A., Madden, S., Hollenbach, K.: SW-Store: a vertically partitioned DBMS for semantic web data management. VLDB J. 18(2), 385–406 (2009)

  2. 2.

    Atre, M., Chaoji, V., Zaki, M.J., Hendler, J.A.: Matrix “bit” loaded: a scalable lightweight join query processor for RDF data. In: Proceedings of the 19th International Conference on World Wide Web (WWW 2010) (2010)

  3. 3.

    Bancilhon, F., Maier, D., Sagiv, Y., Ullman, J.D.: Magic sets and other strange ways to implement logic programs. In: Proceedings of the Fifth ACM SIGACT-SIGMOD Symposium on Principles of Database Systems (PODS 1986) (1986)

  4. 4.

    Bernstein, P.A., Chiu, D.M.W.: Using semi-joins to solve relational queries. J. ACM 28(1), 25–40 (1981)

  5. 5.

    Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hellmann, S.: DBpedia—a crystallization point for the web of data. J. Web Sem. 7(3), 154–165 (2009)

  6. 6.

    Bröcheler, M., Pugliese, A., Subrahmanian, V.S.: DOGMA: a disk-oriented graph matching algorithm for RDF databases. In: Proceedings of the 8th International Semantic Web Conference (ISWC 2009) (2009)

  7. 7.

    Broekstra, J., Kampman, A., van Harmelen, F.: Sesame: a generic architecture for storing and querying RDF and RDF schema. In: Proceedings of the First International Semantic Web Conference (ISWC 2002) (2002)

  8. 8.

    Carroll, J.J., Dickinson, I., Dollin, C., Reynolds, D., Seaborne, A., Wilkinson, K.: Jena: implementing the semantic web recommendations. In: Proceedings of the 13th International Conference on World Wide Web—Alternate Track Papers & Posters (WWW 2004) (2004)

  9. 9.

    Chebotko, A., Lu, S., Fotouhi, F.: Semantics preserving SPARQL-to-SQL translation. Data Knowl. Eng. 68(10), 973–1000 (2009)

  10. 10.

    Chen, M.S., Hsiao, H.I., Yu, P.S.: On applying hash filters to improving the execution of multi-join queries. VLDB J. 6(2), 121–131 (1997)

  11. 11.

    Erling, O., Mikhailov, I.: RDF support in the Virtuoso DBMS. In: Proceedings of the 1st Conference on Social Semantic Web (CSSW 2007) (2007)

  12. 12.

    Fellbaum, C. (ed.): WordNet An Electronic Lexical Database. The MIT Press (1998)

  13. 13.

    Goldman, R., Widom, J.: DataGuides: enabling query formulation and optimization in semistructured databases. In: Proceedings of 23rd International Conference on Very Large Data Bases (VLDB 1997) (1997)

  14. 14.

    Gou, G., Chirkova, R.: Efficiently querying large XML data repositories: a survey. IEEE Trans. Knowl. Data Eng. 19(10), 1381 –1403 (2007)

  15. 15.

    Graefe, G.: Query evaluation techniques for large databases. ACM Comput. Surv. 25(2), 73–170 (1993)

  16. 16.

    Guo, Y., Pan, Z., Heflin, J.: LUBM: a benchmark for OWL knowledge base systems. J. Web Sem. 3(2–3), 158–182 (2005)

  17. 17.

    He, H., Singh, A.K.: Graphs-at-a-time: query language and access methods for graph databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD 2008) (2008)

  18. 18.

    He, H., Yang, J.: Multiresolution indexing of XML for frequent queries. In: Proceedings of the 20th International Conference on Data Engineering (ICDE 2004) (2004)

  19. 19.

    Hoffart, J., Suchanek, F.M., Berberich, K., Weikum, G.: YAGO2: a spatially and temporally enhanced knowledge base from Wikipedia. Artif. Intell. 194, 28–61 (2013)

  20. 20.

    Huang, H., Liu, C., Zhou, X.: Approximating query answering on RDF databases. World Wide Web 15(1), 89–114 (2012)

  21. 21.

    Kaushik, R., Shenoy, P., Bohannon, P., Gudes, E.: Exploiting local similarity for indexing paths in graph-structured data. In: Proceedings of the 18th International Conference on Data Engineering (ICDE 2002) (2002)

  22. 22.

    Kim, K., Moon, B., Kim, H.J.: RP-Filter: a path-based triple filtering method for efficient SPARQL query processing. In: Proceedings of the 2011 Joint International Semantic Technology Conference (JIST 2011) (2011)

  23. 23.

    Klyne, G., Carroll, J.J.: Resource description framework (RDF): concepts and abstract syntax. W3c recommendation, World Wide Web Consortium (2004)

  24. 24.

    Köhler, H.: Estimating set intersection using small samples. In: Proceedings of the Thirty-Third Australasian Computer Science Conference (ACSC 2010) (2010)

  25. 25.

    Kuramochi, M., Karypis, G.: Finding frequent patterns in a large sparse graph. In: Proceedings of the Fourth SIAM International Conference on Data Mining (SDM 2004) (2004)

  26. 26.

    Maduko, A., Anyanwu, K., Sheth, A.P., Schliekelman, P.: Graph summaries for subgraph frequency estimation. In: Proceedings the 5th European Semantic Web Conference (ESWC 2008) (2008)

  27. 27.

    Milo, T., Suciu, D.: Index structures for path expressions. In: Proceedings of the 7th International Conference on Database Theory (ICDT 1999) (1999)

  28. 28.

    Moerkotte, G., Neumann, T., Steidl, G.: Preventing bad plans by bounding the impact of cardinality estimation errors. PVLDB 2(1), 982–993 (2009)

  29. 29.

    Morsey, M., Lehmann, J., Auer, S., Ngomo, A.C.N.: DBpedia SPARQL benchmark—performance assessment with real queries on real data. In: Proceedings of the 10th International Semantic Web Conference (ISWC 2011) (2011)

  30. 30.

    Neumann, T., Moerkotte, G.: Characteristic sets: accurate cardinality estimation for RDF queries with multiple joins. In: Proceedings of the 27th International Conference on Data Engineering (ICDE 2011) (2011)

  31. 31.

    Neumann, T., Weikum, G.: RDF-3X: a RISC-style engine for RDF. PVLDB 1(1), 647–659 (2008)

  32. 32.

    Neumann, T., Weikum, G.: Scalable join processing on very large RDF graphs. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD 2009) (2009)

  33. 33.

    Owens, A., Seaborne, A., Gibbins, N.: Clustered TDB: a clustered triple store for Jena. Tech. rep., University of Southampton (2008)

  34. 34.

    Prud’hommeaux, E., Seaborne, A.: SPARQL query language for RDF. W3c recommendation, W3C Recommendation (2008)

  35. 35.

    Qun, C., Lim, A., Ong, K.W.: D(k)-index: an adaptive structural summary for graph-structured data. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data (SIGMOD 2003) (2003)

  36. 36.

    Schmidt, M., Hornung, T., Lausen, G., Pinkel, C.: SP2Bench: a SPARQL performance benchmark. In: Proceedings of the 25th International Conference on Data Engineering (ICDE 2009) (2009)

  37. 37.

    Selinger, P.G., Astrahan, M.M., Chamberlin, D.D., Lorie, R.A., Price, T.G.: Access path selection in a relational database management system. In: Proceedings of the 1979 ACM SIGMOD International Conference on Management of Data (SIGMOD 1979) (1979)

  38. 38.

    Shasha, D., Wang, J.T.L., Giugno, R.: Algorithmics and applications of tree and graph searching. In: Proceedings of the Twenty-first ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS 2002) (2002)

  39. 39.

    Sidirourgos, L., Goncalves, R., Kersten, M.L., Nes, N., Manegold, S.: Column-store support for RDF data management: not all swans are white. PVLDB 1(2), 1553–1563 (2008)

  40. 40.

    Stocker, M., Seaborne, A., Bernstein, A., Kiefer, C., Reynolds, D.: SPARQL basic graph pattern optimization using selectivity estimation. In: Proceedings of the 17th International Conference on World Wide Web (WWW 2008) (2008)

  41. 41.

    Sun, Z., Wang, H., Wang, H., Shao, B., Li, J.: Efficient subgraph matching on billion node graphs. PVLDB 5(9), 788–799 (2012)

  42. 42.

    Tian, Y., McEachin, R.C., Santos, C., States, D.J., Patel, J.M.: SAGA: a subgraph matching tool for biological graphs. Bioinformatics 23(2), 232–239 (2007)

  43. 43.

    Tran, T., Ladwig, G.: Structure index for RDF data. In: Workshop on Semantic Data Management (SemData@VLDB2010) (2010)

  44. 44.

    Udrea, O., Pugliese, A., Subrahmanian, V.S.: GRIN: a graph based RDF index. In: Proceedings of the Twenty-Second AAAI Conference on Artificial Intelligence (AAAI 2007) (2007)

  45. 45.

    Virgilio, R.D., Nostro, P.D., Gianforme, G., Paolozzi, S.: A scalable and extensible framework for query answering over RDF. World Wide Web 14(5–6), 599–622 (2011)

  46. 46.

    Weiss, C., Karras, P., Bernstein, A.: Hexastore: sextuple indexing for semantic web data management. PVLDB 1(1), 1008–1019 (2008)

  47. 47.

    Wong, K.F., Yu, J., Tang, N.: Answering XML queries using path-based indexes: a survey. World Wide Web 9(3), 277–299 (2006)

  48. 48.

    Yan, X., Yu, P.S., Han, J.: Graph indexing based on discriminative frequent structure analysis. ACM Trans. Database Syst. 30(4), 960–993 (2005)

  49. 49.

    Zhang, S., Li, S., Yang, J.: GADDI: distance index based subgraph matching in biological networks. In: Proceedings of the 12th International Conference on Extending Database Technology (EDBT 2009) (2009)

  50. 50.

    Zhao, P., Han, J.: On graph query optimization in large networks. PVLDB 3(1), 340–351 (2010)

  51. 51.

    Zou, L., Mo, J., Chen, L., Özsu, M.T., Zhao, D.: gStore: answering SPARQL queries via subgraph matching. PVLDB 4(8), 482–493 (2011)

Download references

Author information

Correspondence to Kisung Kim.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Kim, K., Moon, B. & Kim, H. R3F: RDF triple filtering method for efficient SPARQL query processing. World Wide Web 18, 317–357 (2015). https://doi.org/10.1007/s11280-013-0253-1

Download citation

Keywords

  • RDF
  • SPARQL
  • Query optimization
  • Triple filtering
  • Intermediate results