Abstract
We address efficient processing of SPARQL queries over RDF datasets. The proposed techniques, incorporated into the gStore system, handle, in a uniform and scalable manner, SPARQL queries with wildcards and aggregate operators over dynamic RDF datasets. Our approach is graph based. We store RDF data as a large graph and also represent a SPARQL query as a query graph. Thus, the query answering problem is converted into a subgraph matching problem. To achieve efficient and scalable query processing, we develop an index, together with effective pruning rules and efficient search algorithms. We propose techniques that use this infrastructure to answer aggregation queries. We also propose an effective maintenance algorithm to handle online updates over RDF repositories. Extensive experiments confirm the efficiency and effectiveness of our solutions.
Similar content being viewed by others
Notes
The literature on aggregation and aggregate queries frequently refer to these attributes as dimensions. We follow the same convention.
Note that, this is not necessarily \(O_n\), since node identifiers are arbitrarily assigned to help with presentation.
Although dimension list is a set, when the order is important, we specify them as a list enclosed in ( ).
We revised the original 14 to remove type reasoning that gStore does not currently support; the resulting queries return larger result sets since there is no filtering as a result of type reasoning. For completeness, these are included in Online Supplements as Table 15.
References
Abadi, D.J., Marcus, A., Madden, S., Hollenbach, K.J.: Scalable semantic web data management using vertical partitioning. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 411–422 (2007)
Abadi, D.J., Marcus, A., Madden, S., Hollenbach, K.: SW-Store: a vertically partitioned DBMS for semantic web data management. VLDB J. 18(2), 385–406 (2009)
Atre, M., Chaoji, V., Zaki, M.J., Hendler, J.A.: Matrix “bit” loaded: a scalable lightweight join query processor for RDF data. In: Proceedings of the 19th International World Wide Web Conference, pp. 41–50 (2010)
Bernstein, P.A., Chiu, D.-M.W.: Using semi-joins to solve relational queries. J. ACM 28(1), 25–40 (1981)
Bönström, V., Hinze, A., Schweppe, H.: Storing RDF as a graph. In: Proceedings of the 1st Latin American Web Congress, pp. 27–36 (2003)
Broekstra, J., Kampman, A., van Harmelen, F.: Sesame: a generic architecture for storing and querying RDF and RDF schema. In: Proceedings of the 1st International Semantic Web Conference, pp. 54–68 (2002)
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms. The MIT Press, Cambridge (2001)
Deppisch, U.: S-tree: a dynamic balanced signature index for office retrieval. In: Proceedings of the 9th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 77–87 (1986)
Faloutsos, C., Christodoulakis, S.: Signature files: an access method for documents and its analytical performance evaluation. ACM Trans. Inf. Syst. 2(4), 267–288 (1984)
Gravano, L., Ipeirotis, P.G., Koudas, N., Srivastava, D.: Text joins in an RDBMS for web data integration. In: Proceedings of the 12th International World Wide Web Conference, pp. 90–101 (2003)
Gravano, L., Ipeirotis, P.G., Jagadish, H.V., Koudas, N., Muthukrishnan, S., Pietarinen, L., Srivastava, D.: Using \(q\)-grams in a DBMS for approximate string processing. IEEE Data Eng. Bull. 24(4), 28–34 (2001)
Guo, Y., Pan, Z., Heflin, J.: LUBM: a benchmark for OWL knowledge base systems. J. Web Semant. 3(2–3), 158–182 (2005)
Gupta, A., Dallan Quass, V.H.: Aggregate-query processing in data warehousing environments. In: Proceedings of the 21st International Conference on Very Large Data Bases, pp. 358–369 (1995)
Harth, A., Umbrich, J., Hogan, A., Decker, S.: YARS2: a federated repository for querying graph structured data from the web. In: Proceedings of the 6th International Semantic Web Conference, pp. 211–224 (2007)
Hoffart, J., Suchanek, F.M., Berberich, K., Kelham, E.L., de Melo, G., Weikum, G.: YAGO2: exploring and querying world knowledge in time, space, context, and many languages. In: Proceedings of the 20th International World Wide Web Conference, pp. 229–232 (2011)
Hung, E., Deng, Y., Subrahmanian, V.S.: RDF aggregate queries and views. In: Proceedings of the 21st International Conference on Data Engineering, pp. 717–728 (2005)
Johnson, T., Shasha, D.: B-trees with inserts and deletes: why free-at-empty is better than merge-at-half. J. Comput. Syst. Sci. 47(1), 45–76 (1993)
Kitagawa, H., Ishikawa, Y.: False drop analysis of set retrieval with signature files. IEICE Trans. Inf. Syst. E80–D(6), 1–12 (1997)
Neumann, T., Weikum, G.: Scalable join processing on very large RDF graphs. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 627–640 (2009)
Neumann, T., Weikum, G.: RDF-3X: a RISC-style engine for RDF. Proc. VLDB Endow. 1(1), 647–659 (2008)
Neumann, T., Weikum, G.: The RDF-3X engine for scalable management of RDF data. VLDB J. 19(1), 91–113 (2010)
Neumann, T., Weikum, G.: x-RDF-3x: Fast querying, high update rates, and consistency for RDF databases. Proc. VLDB Endow. 1(1), 256–263 (2010)
Pérez, J., Arenas, M., Gutierrez, C.: Semantics and complexity of SPARQL. ACM Trans. Database Syst. 34(3), 16:1–16:45 (2009)
Seid, D.Y., Mehrotra, S.: Grouping and aggregate queries over semantic web databases. In: Proceedings of the International Conference on Semantic Computing, pp. 775–782 (2007)
Shasha, D., Wang, J.T.-L., Giugno, R.: Algorithmics and applications of tree and graph searching. In: Proceedings of the 21st ACM Symposium on Principles of Database Systems, pp. 39–52 (2002)
Stocker, M., Seaborne, A., Bernstein, A., Kiefer, C., Reynolds, D.: SPARQL basic graph pattern optimization using selectivity estimation. In: Proceedings of the 17th International World Wide Web Conference, pp. 595–604 (2008)
Tousidou, E., Nanopoulos, A., Manolopoulos, Y.: Improved methods for signature-tree construction. Comput. J. 43(4), 301–314 (2000)
Tousidou, E., Bozanis, P., Manolopoulos, Y.: Signature-based structures for objects with set-valued attributes. Inf. Syst. 27(2), 93–121 (2002)
Udrea, O., Pugliese, A., Subrahmanian, V.S.: GRIN: a graph based RDF index. In: Proceedings of the 22nd National Conference on Artificial Intelligence, pp. 1465–1470 (2007)
Weiss, C., Karras, P., Bernstein, A.: Hexastore: sextuple indexing for semantic web data management. Proc. VLDB Endow. 1(1), 1008–1019 (2008)
Wilkinson, K., Sayers, C. , Kuno, H.A., Reynolds, D.: Efficient RDF storage and retrieval in Jena2. In: Proceedings of the 1st Inter national Workshop on Semantic Web and Databases, pp. 131–150 (2003)
Yan, Y., Wang, C., Zhou, A., Qian, W., Ma, L., Pan, Y.: Efficient indices using graph partitioning in RDF triple stores. In: Proceedings of the 25th International Conference on Data Engineering, pp. 1263–1266 (2009)
Yan, X., Yu, P.S., Han, J.: Graph indexing: a frequent structure-based approach. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 335–346 (2004)
Yuan, P., Liu, P., Jin, H., Zhang, W., Liu, L.: TripleBit: a fast and compact system for large scale RDF data. Proc. VLDB Endow. 6(7), 517–528 (2013)
Acknowledgments
Lei Zou’s work was supported by National Science Foundation of China (NSFC) under Grant No. 61370055 and by CCF-Tencent Open Research Fund. M. Tamer Özsu’s work was supported by Natural Sciences and Engineering Research Council (NSERC) of Canada under a Discovery Grant. Lei Chen’s work was supported in part by the Hong Kong RGC Project M-HKUST602/12, National Grand Fundamental Research 973 Program of China under Grant No. 2012-CB316200, Microsoft Research Asia Grant, and a Google Faculty Award. Dongyan Zhao was supported by NSFC under Grant No. 61272344 and China 863 Project under Grant No. 2012AA011101.
Author information
Authors and Affiliations
Corresponding author
Additional information
Extended version of paper “gStore: Answering SPARQL Queries via Subgraph Matching” that was presented at 2011 VLDB Conference.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Zou, L., Özsu, M.T., Chen, L. et al. gStore: a graph-based SPARQL query engine. The VLDB Journal 23, 565–590 (2014). https://doi.org/10.1007/s00778-013-0337-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-013-0337-7