The VLDB Journal

, Volume 23, Issue 4, pp 565–590 | Cite as

gStore: a graph-based SPARQL query engine

  • Lei Zou
  • M. Tamer ÖzsuEmail author
  • Lei Chen
  • Xuchuan Shen
  • Ruizhe Huang
  • Dongyan Zhao
Regular Paper


We address efficient processing of SPARQL queries over RDF datasets. The proposed techniques, incorporated into the gStore system, handle, in a uniform and scalable manner, SPARQL queries with wildcards and aggregate operators over dynamic RDF datasets. Our approach is graph based. We store RDF data as a large graph and also represent a SPARQL query as a query graph. Thus, the query answering problem is converted into a subgraph matching problem. To achieve efficient and scalable query processing, we develop an index, together with effective pruning rules and efficient search algorithms. We propose techniques that use this infrastructure to answer aggregation queries. We also propose an effective maintenance algorithm to handle online updates over RDF repositories. Extensive experiments confirm the efficiency and effectiveness of our solutions.


RDF SPARQL Graph database Graph matching  Aggregate query 



Lei Zou’s work was supported by National Science Foundation of China (NSFC) under Grant No. 61370055 and by CCF-Tencent Open Research Fund. M. Tamer Özsu’s work was supported by Natural Sciences and Engineering Research Council (NSERC) of Canada under a Discovery Grant. Lei Chen’s work was supported in part by the Hong Kong RGC Project M-HKUST602/12, National Grand Fundamental Research 973 Program of China under Grant No. 2012-CB316200, Microsoft Research Asia Grant, and a Google Faculty Award. Dongyan Zhao was supported by NSFC under Grant No. 61272344 and China 863 Project under Grant No. 2012AA011101.

Supplementary material

778_2013_337_MOESM1_ESM.pdf (200 kb)
Supplementary material 1 (pdf 200 KB)


  1. 1.
    Abadi, D.J., Marcus, A., Madden, S., Hollenbach, K.J.: Scalable semantic web data management using vertical partitioning. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 411–422 (2007)Google Scholar
  2. 2.
    Abadi, D.J., Marcus, A., Madden, S., Hollenbach, K.: SW-Store: a vertically partitioned DBMS for semantic web data management. VLDB J. 18(2), 385–406 (2009)CrossRefGoogle Scholar
  3. 3.
    Atre, M., Chaoji, V., Zaki, M.J., Hendler, J.A.: Matrix “bit” loaded: a scalable lightweight join query processor for RDF data. In: Proceedings of the 19th International World Wide Web Conference, pp. 41–50 (2010)Google Scholar
  4. 4.
    Bernstein, P.A., Chiu, D.-M.W.: Using semi-joins to solve relational queries. J. ACM 28(1), 25–40 (1981)CrossRefzbMATHMathSciNetGoogle Scholar
  5. 5.
    Bönström, V., Hinze, A., Schweppe, H.: Storing RDF as a graph. In: Proceedings of the 1st Latin American Web Congress, pp. 27–36 (2003)Google Scholar
  6. 6.
    Broekstra, J., Kampman, A., van Harmelen, F.: Sesame: a generic architecture for storing and querying RDF and RDF schema. In: Proceedings of the 1st International Semantic Web Conference, pp. 54–68 (2002)Google Scholar
  7. 7.
    Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms. The MIT Press, Cambridge (2001)zbMATHGoogle Scholar
  8. 8.
    Deppisch, U.: S-tree: a dynamic balanced signature index for office retrieval. In: Proceedings of the 9th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 77–87 (1986)Google Scholar
  9. 9.
    Faloutsos, C., Christodoulakis, S.: Signature files: an access method for documents and its analytical performance evaluation. ACM Trans. Inf. Syst. 2(4), 267–288 (1984)CrossRefGoogle Scholar
  10. 10.
    Gravano, L., Ipeirotis, P.G., Koudas, N., Srivastava, D.: Text joins in an RDBMS for web data integration. In: Proceedings of the 12th International World Wide Web Conference, pp. 90–101 (2003)Google Scholar
  11. 11.
    Gravano, L., Ipeirotis, P.G., Jagadish, H.V., Koudas, N., Muthukrishnan, S., Pietarinen, L., Srivastava, D.: Using \(q\)-grams in a DBMS for approximate string processing. IEEE Data Eng. Bull. 24(4), 28–34 (2001)Google Scholar
  12. 12.
    Guo, Y., Pan, Z., Heflin, J.: LUBM: a benchmark for OWL knowledge base systems. J. Web Semant. 3(2–3), 158–182 (2005)CrossRefGoogle Scholar
  13. 13.
    Gupta, A., Dallan Quass, V.H.: Aggregate-query processing in data warehousing environments. In: Proceedings of the 21st International Conference on Very Large Data Bases, pp. 358–369 (1995)Google Scholar
  14. 14.
    Harth, A., Umbrich, J., Hogan, A., Decker, S.: YARS2: a federated repository for querying graph structured data from the web. In: Proceedings of the 6th International Semantic Web Conference, pp. 211–224 (2007)Google Scholar
  15. 15.
    Hoffart, J., Suchanek, F.M., Berberich, K., Kelham, E.L., de Melo, G., Weikum, G.: YAGO2: exploring and querying world knowledge in time, space, context, and many languages. In: Proceedings of the 20th International World Wide Web Conference, pp. 229–232 (2011)Google Scholar
  16. 16.
    Hung, E., Deng, Y., Subrahmanian, V.S.: RDF aggregate queries and views. In: Proceedings of the 21st International Conference on Data Engineering, pp. 717–728 (2005)Google Scholar
  17. 17.
    Johnson, T., Shasha, D.: B-trees with inserts and deletes: why free-at-empty is better than merge-at-half. J. Comput. Syst. Sci. 47(1), 45–76 (1993)CrossRefzbMATHMathSciNetGoogle Scholar
  18. 18.
    Kitagawa, H., Ishikawa, Y.: False drop analysis of set retrieval with signature files. IEICE Trans. Inf. Syst. E80–D(6), 1–12 (1997)Google Scholar
  19. 19.
    Neumann, T., Weikum, G.: Scalable join processing on very large RDF graphs. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 627–640 (2009)Google Scholar
  20. 20.
    Neumann, T., Weikum, G.: RDF-3X: a RISC-style engine for RDF. Proc. VLDB Endow. 1(1), 647–659 (2008)CrossRefGoogle Scholar
  21. 21.
    Neumann, T., Weikum, G.: The RDF-3X engine for scalable management of RDF data. VLDB J. 19(1), 91–113 (2010)CrossRefGoogle Scholar
  22. 22.
    Neumann, T., Weikum, G.: x-RDF-3x: Fast querying, high update rates, and consistency for RDF databases. Proc. VLDB Endow. 1(1), 256–263 (2010)CrossRefGoogle Scholar
  23. 23.
    Pérez, J., Arenas, M., Gutierrez, C.: Semantics and complexity of SPARQL. ACM Trans. Database Syst. 34(3), 16:1–16:45 (2009)Google Scholar
  24. 24.
    Seid, D.Y., Mehrotra, S.: Grouping and aggregate queries over semantic web databases. In: Proceedings of the International Conference on Semantic Computing, pp. 775–782 (2007)Google Scholar
  25. 25.
    Shasha, D., Wang, J.T.-L., Giugno, R.: Algorithmics and applications of tree and graph searching. In: Proceedings of the 21st ACM Symposium on Principles of Database Systems, pp. 39–52 (2002)Google Scholar
  26. 26.
    Stocker, M., Seaborne, A., Bernstein, A., Kiefer, C., Reynolds, D.: SPARQL basic graph pattern optimization using selectivity estimation. In: Proceedings of the 17th International World Wide Web Conference, pp. 595–604 (2008)Google Scholar
  27. 27.
    Tousidou, E., Nanopoulos, A., Manolopoulos, Y.: Improved methods for signature-tree construction. Comput. J. 43(4), 301–314 (2000)CrossRefzbMATHGoogle Scholar
  28. 28.
    Tousidou, E., Bozanis, P., Manolopoulos, Y.: Signature-based structures for objects with set-valued attributes. Inf. Syst. 27(2), 93–121 (2002)CrossRefzbMATHGoogle Scholar
  29. 29.
    Udrea, O., Pugliese, A., Subrahmanian, V.S.: GRIN: a graph based RDF index. In: Proceedings of the 22nd National Conference on Artificial Intelligence, pp. 1465–1470 (2007)Google Scholar
  30. 30.
    Weiss, C., Karras, P., Bernstein, A.: Hexastore: sextuple indexing for semantic web data management. Proc. VLDB Endow. 1(1), 1008–1019 (2008)CrossRefGoogle Scholar
  31. 31.
    Wilkinson, K., Sayers, C. , Kuno, H.A., Reynolds, D.: Efficient RDF storage and retrieval in Jena2. In: Proceedings of the 1st Inter national Workshop on Semantic Web and Databases, pp. 131–150 (2003)Google Scholar
  32. 32.
    Yan, Y., Wang, C., Zhou, A., Qian, W., Ma, L., Pan, Y.: Efficient indices using graph partitioning in RDF triple stores. In: Proceedings of the 25th International Conference on Data Engineering, pp. 1263–1266 (2009)Google Scholar
  33. 33.
    Yan, X., Yu, P.S., Han, J.: Graph indexing: a frequent structure-based approach. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 335–346 (2004)Google Scholar
  34. 34.
    Yuan, P., Liu, P., Jin, H., Zhang, W., Liu, L.: TripleBit: a fast and compact system for large scale RDF data. Proc. VLDB Endow. 6(7), 517–528 (2013)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Lei Zou
    • 1
  • M. Tamer Özsu
    • 2
    Email author
  • Lei Chen
    • 3
  • Xuchuan Shen
    • 1
  • Ruizhe Huang
    • 1
  • Dongyan Zhao
    • 1
  1. 1.Institute of Computer Science and TechnologyPeking UniversityBeijingChina
  2. 2.David R. Cheriton School of Computer ScienceUniversity of WaterlooWaterlooCanada
  3. 3.Department of Computer Science and EngineeringHong Kong University of Science and TechnologyHong KongChina

Personalised recommendations