Storing and Indexing Massive RDF Datasets

  • Yongming LuoEmail author
  • François Picalausa
  • George H. L. Fletcher
  • Jan Hidders
  • Stijn Vansummeren
Part of the Data-Centric Systems and Applications book series (DCSA)


The resource description framework (RDF for short) provides a flexible method for modeling information on the Web [34, 40]. All data items in RDF are uniformly represented as triples of the form (subject, predicate, object), sometimes also referred to as (subject, property, value)triples.


Inverted Index Keyword Query Conjunctive Query Triple Pattern Suffix Array 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



The research of FP is supported by an FNRS/FRIA scholarship. The research of SV is supported by the OSCB project funded by the Brussels Capital Region. The research of GF, JH, and YL is supported by the Netherlands Organisation for Scientific Research (NWO).


  1. 1.
    Abadi, D., Marcus, A., Madden, S., Hollenbach, K.: SW-Store: a vertically partitioned DBMS for semantic web data management. VLDB J. 18, 385–406 (2009)Google Scholar
  2. 2.
    Angles, R., Gutierrez, C.: Survey of graph database models. ACM Comput. Surv. 40, 1:1–1:39 (2008)Google Scholar
  3. 3.
    Arion, A., Bonifati, A., Manolescu, I., Pugliese, A.: Path summaries and path partitioning in modern XML databases. World Wide Web 11, 117–151 (2008)Google Scholar
  4. 4.
    Atre, M., Chaoji, V., Zaki, M.J., Hendler, J.A.: Matrix “bit” loaded: a scalable lightweight join query processor for RDF data. Proceedings of the 19th International Conference on World Wide Web, WWW ’10, pp. 41–50. ACM, New York, NY, USA (2010)Google Scholar
  5. 5.
    Bizer, C., Jentzsch, A., Cyganiak, R.: State of the LOD cloud. Retrieved July 5, 2011
  6. 6.
    Bönström, V., Hinze, A., Schweppe, H.: Storing RDF as a graph. Proceedings of the First Conference on Latin American Web Congress, pp. 27–36. IEEE Computer Society, Washington, DC, USA (2003)Google Scholar
  7. 7.
    Broekstra, J., Kampman, A., van Harmelen, F.: Sesame: a generic architecture for storing and querying RDF and RDF schema. International semantic web conference, pp. 54–68. Sardinia, Italy (2002)Google Scholar
  8. 8.
    Castillo, R.: RDFMatView: indexing RDF data for SPARQL queries. 9th international semantic web conference (ISWC2010), 2010Google Scholar
  9. 9.
    Center, Q.G.: Bio2RDF., 10 March 2012
  10. 10.
    Chong, E.I., Das, S., Eadon, G., Srinivasan, J.: An efficient SQL-based RDF qeurying scheme. VLDB, pp. 1216–1227. Trondheim, Norway (2005)Google Scholar
  11. 11., 10 March 2012
  12. 12.
    Delbru, R., Campinas, S., Tummarello, G.: Searching web data: an entity retrieval and high-performance indexing model. Web Semant. Sci. Serv. Agents World Wide Web 10, 33–58 (2012)Google Scholar
  13. 13.
    Dong, X., Halevy, A.Y.: Indexing dataspaces. ACM SIGMOD, Beijing, 2007, pp. 43–54Google Scholar
  14. 14.
    Elbassuoni, S., Ramanath, M., Schenkel, R., Weikum, G.: Searching RDF graphs with SPARQL and keywords. IEEE Data Eng. Bull. 33(1), 16–24 (2010)Google Scholar
  15. 15.
    Erling, O.: Towards web scale RDF. SSWS. Karlsruhe, Germany (2008)Google Scholar
  16. 16.
    Erling, O., Mikhailov, I.: RDF support in the virtuoso DBMS. In: Auer S., Bizer C., Müller C., Zhdanova A.V. (eds.) CSSW, LNI, vol. 113, pp. 59–68. GI (2007)Google Scholar
  17. 17.
    Fletcher, G.H.L., Beck, P.W.: Scalable indexing of RDF graphs for efficient join processing. CIKM, pp. 1513–1516, Hong Kong, 2009Google Scholar
  18. 18.
    Fletcher, G.H.L., Van Den Bussche, J., Van Gucht, D., Vansummeren, S.: Towards a theory of search queries. ACM Trans. Database Syst. 35, 28:1–28:33 (2010)Google Scholar
  19. 19.
    Fletcher, G.H.L., Van Gucht, D., Wu, Y., Gyssens, M., Brenes, S., Paredaens, J.: A methodology for coupling fragments of XPath with structural indexes for XML documents. Inform. Syst. 34(7), 657–670 (2009)Google Scholar
  20. 20.
    Franklin, M., Halevy, A., Maier, D.: From databases to dataspaces: a new abstraction for information management. SIGMOD Rec. 34, 27–33 (2005)Google Scholar
  21. 21.
    Furche, T., Weinzierl, A., Bry, F.: Labeling RDF graphs for linear time and space querying. In: De Virgilio, R., Giunchiglia, F., Tanca, L. (eds.) Semantic Web Information Management, pp. 309–339. Springer, Berlin, Heidelberg, New York (2009)Google Scholar
  22. 22.
    Goasdoué, F., Karanasos, K., Leblay, J., Manolescu, I.: Rdfviews: a storage tuning wizard for rdf applications. In: Huang, J., Koudas, N., Jones, G.J.F., Wu, X., Collins-Thompson, K., An, A. (eds.) CIKM, pp. 1947–1948. ACM, New York (2010)Google Scholar
  23. 23.
    Goldman, R., Widom, J.: DataGuides: enabling query formulation and optimization in semistructured databases. VLDB, pp. 436–445, Athens, Greece, 1997Google Scholar
  24. 24.
    Gou, G., Chirkova, R.: Efficiently querying large XML data repositories: a survey. IEEE Trans. Knowl. Data Eng. 19(10), 1381–1403 (2007)Google Scholar
  25. 25.
    Groppe, S., Groppe, J., Linnemann, V.: Using an index of precomputed joins in order to speed up SPARQL processing. ICEIS, pp. 13–20, Funchal, Madeira, Portugal, 2007Google Scholar
  26. 26.
    Gyssens, M., Paredaens, J., Van Gucht, D., Fletcher, G.H.L.: Structural characterizations of the semantics of XPath as navigation tool on a document. ACM PODS, pp. 318–327, Chicago, 2006Google Scholar
  27. 27.
    Haffmans, W.J., Fletcher, G.H.L.: Efficient RDFS entailment in external memory. In: SWWS, pp. 464–473. Hersonissos, Crete, Greece (2011)Google Scholar
  28. 28.
    Halevy, A.Y.: Answering queries using views: a survey. VLDB J. 10(4), 270–294 (2001)Google Scholar
  29. 29.
    Halevy, A.Y., Franklin, M.J., Maier, D.: Principles of dataspace systems. PODS, pp. 1–9, Chicago, 2006Google Scholar
  30. 30.
    Harris, S.: SPARQL query processing with conventional relational database systems. Web Inform. Syst. Eng. WISE 2005 Workshops 3807, 235–244 (2005)Google Scholar
  31. 31.
    Harris, S., Gibbins, N.: 3store: efficient bulk RDF storage. PSSS1, Proceedings of the First International Workshop on Practical and Scalable Semantic Systems, pp. 1–15, Sanibel Island, Florida, 2003Google Scholar
  32. 32.
    Harth, A., Decker, S.: Optimized index structures for querying RDF from the web. IEEE LA-WEB, pp. 71–80, Buenos Aires, Argentina (2005)Google Scholar
  33. 33.
    Harth, A., Umbrich, J., Hogan, A., Decker, S.: YARS2: a federated repository for querying graph structured data from the web. ISWC. Busan, Korea (2007)Google Scholar
  34. 34.
    Hayes, P.: RDF Semantics. W3C Recommendation (2004)Google Scholar
  35. 35.
    Heath, T., Bizer, C.: Linked Data: Evolving the Web into a Global Data Space. Morgan and Claypool, San Francisco (2011)Google Scholar
  36. 36.
    Hertel, A., Broekstra, J., Stuckenschmidt, H.: RDF storage and retrieval systems. In: Staab, S., Rudi Studer, D. (eds.) Handbook on Ontologies, International Handbooks on Information Systems, pp. 489–508. Springer, Berlin, Heidelberg (2009)Google Scholar
  37. 37.
    Kaushik, R., Bohannon, P., Naughton, J.F., Korth, H.F.: Covering indexes for branching path queries. ACM SIGMOD, pp. 133–144. Madison, WI (2002)Google Scholar
  38. 38.
    Kaushik, R., Krishnamurthy, R., Naughton, J.F., Ramakrishnan, R.: On the integration of structure indexes and inverted lists. ACM SIGMOD, pp. 779–790. Paris (2004)Google Scholar
  39. 39.
    Kaushik, R., Shenoy, P., Bohannon, P., Gudes, E.: Exploiting local similarity for indexing paths in graph-structured data. IEEE ICDE, pp. 129–140. San Jose, CA (2002)Google Scholar
  40. 40.
    Klyne, G., Carroll, J.J.: Resource description framework (RDF): concepts and abstract syntax. W3C Recommendation (2004)Google Scholar
  41. 41.
    Kolas, D., Emmons, I., Dean, M.: Efficient linked-list RDF indexing in parliament. In: Fokoue A., Guo Y., Liebig T. (eds.) Proceedings of the 5th International Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS2009), CEUR, vol. 517, pp. 17–32. Washington DC, USA (2009)Google Scholar
  42. 42.
    Kolas, D., Self, T.: Spatially-augmented knowledgebase. In: Aberer, K., Choi, K.S., Noy, N.F., Allemang, D., Lee, K.I., Nixon, L.J.B., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux. P. (eds.) ISWC/ASWC, Lecture Notes in Computer Science, vol. 4825, pp. 792–801. Springer, Berlin Heidelberg (2007)Google Scholar
  43. 43.
    Konstantinou, N., Spanos, D.E., Mitrou, N.: Ontology and database mapping: a survey of current implementations and future directions. J. Web Eng. 7(1), 1–24 (2008)Google Scholar
  44. 44.
    Ladwig, G., Tran, T.: Combining query translation with query answering for efficient keyword search. ESWC, pp. 288–303. Crete (2010)Google Scholar
  45. 45.
    Lee, J., Pham, M.D., Lee, J., Han, W.S., Cho, H., Yu, H., Lee, J.H.: Processing SPARQL queries with regular expressions in RDF databases. Proceedings of the ACM Fourth International Workshop on Data and Text Mining in Biomedical Informatics, DTMBIO ’10, pp. 23–30. ACM, New York, NY, USA (2010)Google Scholar
  46. 46.
    Levandoski, J.J., Mokbel, M.F.: RDF data-centric storage. ICWS, pp. 911–918. IEEE (2009)Google Scholar
  47. 47.
    Li, Q., Moon, B.: Indexing and querying xml data for regular path expressions. Proceedings of the 27th International Conference on very Large data bases, VLDB ’01, pp. 361–370. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2001)Google Scholar
  48. 48.
    Lim, L., Wang, M., Padmanabhan, S., Vitter, J.S., Agarwal, R.: Dynamic maintenance of web indexes using landmarks. Proceedings of the 12th International Conference on World Wide Web, WWW ’03, pp. 102–111. ACM, New York, NY, USA (2003)Google Scholar
  49. 49.
    Ma, L., Su, Z., Pan, Y., Zhang, L., Liu, T.: RStar: an RDF storage and query system for enterprise resource management. ACM CIKM, pp. 484–491. Washington, D.C. (2004)Google Scholar
  50. 50.
    del Mar Roldán García, M., Montes, J.F.A.: A survey on disk oriented querying and reasoning on the semantic web. IEEE ICDE workshop SWDB. Atlanta (2006)Google Scholar
  51. 51.
    Matono, A., Amagasa, T., Yoshikawa, M., Uemura, S.: An indexing scheme for RDF and RDF schema based on suffix arrays. SWDB, pp. 151–168. Berlin (2003)Google Scholar
  52. 52.
    Matono, A., Amagasa, T., Yoshikawa, M., Uemura, S.: A path-based relational RDF database. ADC, pp. 95–103. Newcastle, Australia (2005)Google Scholar
  53. 53.
    McGlothlin, J.P., Khan, L.R.: Rdfkb: efficient support for rdf inference queries and knowledge management. In: Desai B.C., Saccà D., Greco S. (eds.) IDEAS, ACM international conference proceeding series, pp. 259–266. ACM (2009)Google Scholar
  54. 54.
    Milo, T., Suciu, D.: Index structures for path expressions. ICDT, pp. 277–295. Jerusalem (1999)Google Scholar
  55. 55.
    Minack, E., Sauermann, L., Grimnes, G., Fluit, C.: The sesame lucenesail: Rdf queries with full-text search. Techinical report, NEPOMUK Consortium, (2008)Google Scholar
  56. 56.
    Neumann, T., Weikum, G.: RDF-3X: a RISC-style engine for RDF. VLDB. Auckland, New Zealand (2008)Google Scholar
  57. 57.
    Neumann, T., Weikum, G.: x-RDF-3X: fast querying, high update rates, and consistency for RDF databases. Proc. VLDB Endow. 3, 256–263 (2010)Google Scholar
  58. 58.
    Oren, E., Kotoulas, S., Anadiotis, G., Siebes, R., ten Teije, A., van Harmelen, F.: Marvin: distributed reasoning over large-scale semantic web data. J. Web Semant. 7(4), 305–316 (2009)Google Scholar
  59. 59.
    Prud’hommeaux, E., Seaborne, A.: SPARQL query language for RDF. W3C Recommendation (2008)Google Scholar
  60. 60.
    Ramanan, P.: Covering indexes for XML queries: bisimulation – simulation – negation. Proceedings of the 29th International Conference on very Large Data Bases – Volume 29, VLDB ’2003, pp. 165–176. VLDB Endowment (2003)Google Scholar
  61. 61.
    Rao, P., Moon, B.: Sequencing XML data and query twigs for fast pattern matching. ACM Trans. Database Syst. 31, 299–345 (2006)Google Scholar
  62. 62.
    Rohloff, K., Dean, M., Emmons, I., Ryder, D., Sumner, J.: An evaluation of triple-store technologies for large data stores. OTM 2007 Workshop SSWS, pp. 1105–1114. Vilamoura, Portugal (2007)Google Scholar
  63. 63.
    Rohloff, K., Schantz, R.E.: Clause-iteration with mapreduce to scalably query datagraphs in the shard graph-store. Proceedings of the Fourth International Workshop on Data-Intensive Distributed Computing, DIDC ’11, pp. 35–44. ACM, New York, NY, USA (2011)Google Scholar
  64. 64.
    Sakr, S., Al-Naymat, G.: Relational processing of RDF queries: a survey. ACM SIGMOD Record 38, 23–28 (2009).
  65. 65.
    Schmidt, M., Hornung, T., Küchlin, N., Lausen, G., Pinkel, C.: An experimental comparison of RDF data management approaches in a SPARQL benchmark scenario. ISWC, pp. 82–97. Karlsruhe (2008)Google Scholar
  66. 66.
    Schmidt, M., Hornung, T., Lausen, G., Pinkel, C.: \({\mathrm{SP}}^{2}\mathrm{Bench}\): a SPARQL performance benchmark. IEEE ICDE. Shanghai (2009)Google Scholar
  67. 67.
    Sidirourgos, L., Goncalves, R., Kersten, M., Nes, N., Manegold, S.: Column-store support for RDF data management: not all swans are white. Proc. VLDB Endow. 1, 1553–1563 (2008)Google Scholar
  68. 68.
    Sintek, M., Kiesel, M.: RDFBroker: a signature-based high-performance RDF store. ESWC, pp. 363–377. Budva, Montenegro (2006)Google Scholar
  69. 69.
    Theoharis, Y., Christophides, V., Karvounarakis, G.: Benchmarking database representations of RDF/S stores. In: Gil Y., Motta E., Benjamins V.R., Musen M.A. (eds.) International semantic web conference. Lecture Notes in Computer Science, vol. 3729, pp. 685–701. Springer, Berlin, Heidelberg, New York (2005)Google Scholar
  70. 70.
    Tran, T., Ladwig, G.: Structure index for RDF data. Workshop on Semantic Data Management (SemData@ VLDB) (2010)Google Scholar
  71. 71.
    Tran, T., Wang, H., Rudolph, S., Cimiano, P.: Top-k exploration of query candidates for efficient keyword search on graph-shaped (RDF) data. ICDE, pp. 405–416. Shanghai (2009)Google Scholar
  72. 72.
    Tummarello, G., Cyganiak, R., Catasta, M., Danielczyk, S., Delbru, R., Decker, S.: live views on the web of data. Web Semant. Sci. Serv. Agents World Wide Web 8(4), 355–364 (2010). DOI 10.1016/j.websem.2010.08.003Google Scholar
  73. 73.
    Udrea, O., Pugliese, A., Subrahmanian, V.S.: GRIN: a graph based RDF index. AAAI, pp. 1465–1470. Vancouver, B.C. (2007)Google Scholar
  74. 74.
    Valduriez, P.: Join indices. ACM Trans. Database Syst. 12, 218–246 (1987)Google Scholar
  75. 75.
    W3C SWEO Community Project: Linking open data., 10 March 2012
  76. 76.
    Wang, H., Liu, Q., Penin, T., Fu, L., Zhang, L., Tran, T., Yu, Y., Pan, Y.: Semplore: a scalable IR approach to search the web of data. Web Semant. Sci. Serv. Agents World Wide Web 7(3), 177–188 (2009)Google Scholar
  77. 77.
    Wang, X., Wang, S., Pufeng, D., Zhiyong, F.: Path summaries and path partitioning in modern XML databases. Int. J. Modern Edu. Comput. Sci. 3, 55–61 (2011)Google Scholar
  78. 78.
    Weikum, G., Theobald, M.: From information to knowledge: harvesting entities and relationships from web sources. PODS, pp. 65–76. Indianapolis (2010)Google Scholar
  79. 79.
    Weiss, C., Karras, P., Bernstein, A.: Hexastore: sextuple indexing for semantic web data management. VLDB, Auckland, New Zealand (2008)Google Scholar
  80. 80.
    Wilkinson, K.: Jena property table implementation. SSWS, pp. 35–46. Athens, Georgia, USA (2006)Google Scholar
  81. 81.
    Witten, I., Moffat, A., Bell, T.: Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kaufmann, Los Altos, CA (1999)Google Scholar
  82. 82.
    Wood, D., Gearon, P., Adams, T.: Kowari: a platform for semantic web storage and analysis. XTech. Amsterdam (2005)Google Scholar
  83. 83.
    Wu, G., Li, J.: Managing large scale native RDF semantic repository from the graph model perspective. ACM SIGMOD Workshop IDAR, pp. 85–86. Beijing (2007)Google Scholar
  84. 84.
    Wu, Y., Gucht, D.V., Gyssens, M., Paredaens, J.: A study of a positive fragment of path queries: Expressiveness, normal form and minimization. Comput. J. 54(7), 1091–1118 (2011)Google Scholar
  85. 85.
    Yang, M., Wu, G.: Caching intermediate result of sparql queries. In:  Srinivasan, S.,  Ramamritham, K., Kumar, A., Ravindra, M.P.,  Bertino, E., Kumar, R. (eds.) WWW (Companion Volume), pp. 159–160. ACM, New York (Los Angeles, CA) (2011)Google Scholar
  86. 86.
    Zou, L., Mo, J., Chen, L., Özsu, M.T., Zhao, D.: gStore: answering SPARQL queries via subgraph matching. Proc. VLDB Endowment 4(8), 482–493 (2011)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Yongming Luo
      Email author
    • François Picalausa
      • 1
    • George H. L. Fletcher
      • 2
    • Jan Hidders
      • 3
    • Stijn Vansummeren
      • 4
    1. 1.Department of Mathematics and Computer ScienceEindhoven University of TechnologyEindhovenThe Netherlands
    2. 2.Department of Mathematics and Computer ScienceEindhoven University of TechnologyEindhovenThe Netherlands
    3. 3.Department of Mathematics and Computer ScienceEindhoven University of TechnologyEindhovenThe Netherlands
    4. 4.Department of Mathematics and Computer ScienceEindhoven University of TechnologyEindhovenThe Netherlands

    Personalised recommendations