Skip to main content

Storing and Indexing Massive RDF Datasets

  • Chapter
  • First Online:

Part of the book series: Data-Centric Systems and Applications ((DCSA))

Abstract

The resource description framework (RDF for short) provides a flexible method for modeling information on the Web [34, 40]. All data items in RDF are uniformly represented as triples of the form (subject, predicate, object), sometimes also referred to as (subject, property, value)triples.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Abadi, D., Marcus, A., Madden, S., Hollenbach, K.: SW-Store: a vertically partitioned DBMS for semantic web data management. VLDB J. 18, 385–406 (2009)

    Article  Google Scholar 

  2. Angles, R., Gutierrez, C.: Survey of graph database models. ACM Comput. Surv. 40, 1:1–1:39 (2008)

    Article  Google Scholar 

  3. Arion, A., Bonifati, A., Manolescu, I., Pugliese, A.: Path summaries and path partitioning in modern XML databases. World Wide Web 11, 117–151 (2008)

    Article  Google Scholar 

  4. Atre, M., Chaoji, V., Zaki, M.J., Hendler, J.A.: Matrix “bit” loaded: a scalable lightweight join query processor for RDF data. Proceedings of the 19th International Conference on World Wide Web, WWW ’10, pp. 41–50. ACM, New York, NY, USA (2010)

    Google Scholar 

  5. Bizer, C., Jentzsch, A., Cyganiak, R.: State of the LOD cloud. http://www4.wiwiss.fu-berlin.de/lodcloud/state/. Retrieved July 5, 2011

  6. Bönström, V., Hinze, A., Schweppe, H.: Storing RDF as a graph. Proceedings of the First Conference on Latin American Web Congress, pp. 27–36. IEEE Computer Society, Washington, DC, USA (2003)

    Google Scholar 

  7. Broekstra, J., Kampman, A., van Harmelen, F.: Sesame: a generic architecture for storing and querying RDF and RDF schema. International semantic web conference, pp. 54–68. Sardinia, Italy (2002)

    Google Scholar 

  8. Castillo, R.: RDFMatView: indexing RDF data for SPARQL queries. 9th international semantic web conference (ISWC2010), 2010

    Google Scholar 

  9. Center, Q.G.: Bio2RDF. http://bio2rdf.org/, 10 March 2012

  10. Chong, E.I., Das, S., Eadon, G., Srinivasan, J.: An efficient SQL-based RDF qeurying scheme. VLDB, pp. 1216–1227. Trondheim, Norway (2005)

    Google Scholar 

  11. Data.gov. http://www.data.gov, 10 March 2012

  12. Delbru, R., Campinas, S., Tummarello, G.: Searching web data: an entity retrieval and high-performance indexing model. Web Semant. Sci. Serv. Agents World Wide Web 10, 33–58 (2012)

    Article  Google Scholar 

  13. Dong, X., Halevy, A.Y.: Indexing dataspaces. ACM SIGMOD, Beijing, 2007, pp. 43–54

    Google Scholar 

  14. Elbassuoni, S., Ramanath, M., Schenkel, R., Weikum, G.: Searching RDF graphs with SPARQL and keywords. IEEE Data Eng. Bull. 33(1), 16–24 (2010)

    Google Scholar 

  15. Erling, O.: Towards web scale RDF. SSWS. Karlsruhe, Germany (2008)

    Google Scholar 

  16. Erling, O., Mikhailov, I.: RDF support in the virtuoso DBMS. In: Auer S., Bizer C., Müller C., Zhdanova A.V. (eds.) CSSW, LNI, vol. 113, pp. 59–68. GI (2007)

    Google Scholar 

  17. Fletcher, G.H.L., Beck, P.W.: Scalable indexing of RDF graphs for efficient join processing. CIKM, pp. 1513–1516, Hong Kong, 2009

    Google Scholar 

  18. Fletcher, G.H.L., Van Den Bussche, J., Van Gucht, D., Vansummeren, S.: Towards a theory of search queries. ACM Trans. Database Syst. 35, 28:1–28:33 (2010)

    Article  Google Scholar 

  19. Fletcher, G.H.L., Van Gucht, D., Wu, Y., Gyssens, M., Brenes, S., Paredaens, J.: A methodology for coupling fragments of XPath with structural indexes for XML documents. Inform. Syst. 34(7), 657–670 (2009)

    Article  Google Scholar 

  20. Franklin, M., Halevy, A., Maier, D.: From databases to dataspaces: a new abstraction for information management. SIGMOD Rec. 34, 27–33 (2005)

    Article  Google Scholar 

  21. Furche, T., Weinzierl, A., Bry, F.: Labeling RDF graphs for linear time and space querying. In: De Virgilio, R., Giunchiglia, F., Tanca, L. (eds.) Semantic Web Information Management, pp. 309–339. Springer, Berlin, Heidelberg, New York (2009)

    Google Scholar 

  22. Goasdoué, F., Karanasos, K., Leblay, J., Manolescu, I.: Rdfviews: a storage tuning wizard for rdf applications. In: Huang, J., Koudas, N., Jones, G.J.F., Wu, X., Collins-Thompson, K., An, A. (eds.) CIKM, pp. 1947–1948. ACM, New York (2010)

    Google Scholar 

  23. Goldman, R., Widom, J.: DataGuides: enabling query formulation and optimization in semistructured databases. VLDB, pp. 436–445, Athens, Greece, 1997

    Google Scholar 

  24. Gou, G., Chirkova, R.: Efficiently querying large XML data repositories: a survey. IEEE Trans. Knowl. Data Eng. 19(10), 1381–1403 (2007)

    Article  Google Scholar 

  25. Groppe, S., Groppe, J., Linnemann, V.: Using an index of precomputed joins in order to speed up SPARQL processing. ICEIS, pp. 13–20, Funchal, Madeira, Portugal, 2007

    Google Scholar 

  26. Gyssens, M., Paredaens, J., Van Gucht, D., Fletcher, G.H.L.: Structural characterizations of the semantics of XPath as navigation tool on a document. ACM PODS, pp. 318–327, Chicago, 2006

    Google Scholar 

  27. Haffmans, W.J., Fletcher, G.H.L.: Efficient RDFS entailment in external memory. In: SWWS, pp. 464–473. Hersonissos, Crete, Greece (2011)

    Google Scholar 

  28. Halevy, A.Y.: Answering queries using views: a survey. VLDB J. 10(4), 270–294 (2001)

    Article  Google Scholar 

  29. Halevy, A.Y., Franklin, M.J., Maier, D.: Principles of dataspace systems. PODS, pp. 1–9, Chicago, 2006

    Google Scholar 

  30. Harris, S.: SPARQL query processing with conventional relational database systems. Web Inform. Syst. Eng. WISE 2005 Workshops 3807, 235–244 (2005)

    Google Scholar 

  31. Harris, S., Gibbins, N.: 3store: efficient bulk RDF storage. PSSS1, Proceedings of the First International Workshop on Practical and Scalable Semantic Systems, pp. 1–15, Sanibel Island, Florida, 2003

    Google Scholar 

  32. Harth, A., Decker, S.: Optimized index structures for querying RDF from the web. IEEE LA-WEB, pp. 71–80, Buenos Aires, Argentina (2005)

    Google Scholar 

  33. Harth, A., Umbrich, J., Hogan, A., Decker, S.: YARS2: a federated repository for querying graph structured data from the web. ISWC. Busan, Korea (2007)

    Google Scholar 

  34. Hayes, P.: RDF Semantics. W3C Recommendation (2004)

    Google Scholar 

  35. Heath, T., Bizer, C.: Linked Data: Evolving the Web into a Global Data Space. Morgan and Claypool, San Francisco (2011)

    Google Scholar 

  36. Hertel, A., Broekstra, J., Stuckenschmidt, H.: RDF storage and retrieval systems. In: Staab, S., Rudi Studer, D. (eds.) Handbook on Ontologies, International Handbooks on Information Systems, pp. 489–508. Springer, Berlin, Heidelberg (2009)

    Google Scholar 

  37. Kaushik, R., Bohannon, P., Naughton, J.F., Korth, H.F.: Covering indexes for branching path queries. ACM SIGMOD, pp. 133–144. Madison, WI (2002)

    Google Scholar 

  38. Kaushik, R., Krishnamurthy, R., Naughton, J.F., Ramakrishnan, R.: On the integration of structure indexes and inverted lists. ACM SIGMOD, pp. 779–790. Paris (2004)

    Google Scholar 

  39. Kaushik, R., Shenoy, P., Bohannon, P., Gudes, E.: Exploiting local similarity for indexing paths in graph-structured data. IEEE ICDE, pp. 129–140. San Jose, CA (2002)

    Google Scholar 

  40. Klyne, G., Carroll, J.J.: Resource description framework (RDF): concepts and abstract syntax. W3C Recommendation (2004)

    Google Scholar 

  41. Kolas, D., Emmons, I., Dean, M.: Efficient linked-list RDF indexing in parliament. In: Fokoue A., Guo Y., Liebig T. (eds.) Proceedings of the 5th International Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS2009), CEUR, vol. 517, pp. 17–32. Washington DC, USA (2009)

    Google Scholar 

  42. Kolas, D., Self, T.: Spatially-augmented knowledgebase. In: Aberer, K., Choi, K.S., Noy, N.F., Allemang, D., Lee, K.I., Nixon, L.J.B., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux. P. (eds.) ISWC/ASWC, Lecture Notes in Computer Science, vol. 4825, pp. 792–801. Springer, Berlin Heidelberg (2007)

    Google Scholar 

  43. Konstantinou, N., Spanos, D.E., Mitrou, N.: Ontology and database mapping: a survey of current implementations and future directions. J. Web Eng. 7(1), 1–24 (2008)

    Google Scholar 

  44. Ladwig, G., Tran, T.: Combining query translation with query answering for efficient keyword search. ESWC, pp. 288–303. Crete (2010)

    Google Scholar 

  45. Lee, J., Pham, M.D., Lee, J., Han, W.S., Cho, H., Yu, H., Lee, J.H.: Processing SPARQL queries with regular expressions in RDF databases. Proceedings of the ACM Fourth International Workshop on Data and Text Mining in Biomedical Informatics, DTMBIO ’10, pp. 23–30. ACM, New York, NY, USA (2010)

    Google Scholar 

  46. Levandoski, J.J., Mokbel, M.F.: RDF data-centric storage. ICWS, pp. 911–918. IEEE (2009)

    Google Scholar 

  47. Li, Q., Moon, B.: Indexing and querying xml data for regular path expressions. Proceedings of the 27th International Conference on very Large data bases, VLDB ’01, pp. 361–370. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2001)

    Google Scholar 

  48. Lim, L., Wang, M., Padmanabhan, S., Vitter, J.S., Agarwal, R.: Dynamic maintenance of web indexes using landmarks. Proceedings of the 12th International Conference on World Wide Web, WWW ’03, pp. 102–111. ACM, New York, NY, USA (2003)

    Google Scholar 

  49. Ma, L., Su, Z., Pan, Y., Zhang, L., Liu, T.: RStar: an RDF storage and query system for enterprise resource management. ACM CIKM, pp. 484–491. Washington, D.C. (2004)

    Google Scholar 

  50. del Mar Roldán García, M., Montes, J.F.A.: A survey on disk oriented querying and reasoning on the semantic web. IEEE ICDE workshop SWDB. Atlanta (2006)

    Google Scholar 

  51. Matono, A., Amagasa, T., Yoshikawa, M., Uemura, S.: An indexing scheme for RDF and RDF schema based on suffix arrays. SWDB, pp. 151–168. Berlin (2003)

    Google Scholar 

  52. Matono, A., Amagasa, T., Yoshikawa, M., Uemura, S.: A path-based relational RDF database. ADC, pp. 95–103. Newcastle, Australia (2005)

    Google Scholar 

  53. McGlothlin, J.P., Khan, L.R.: Rdfkb: efficient support for rdf inference queries and knowledge management. In: Desai B.C., Saccà D., Greco S. (eds.) IDEAS, ACM international conference proceeding series, pp. 259–266. ACM (2009)

    Google Scholar 

  54. Milo, T., Suciu, D.: Index structures for path expressions. ICDT, pp. 277–295. Jerusalem (1999)

    Google Scholar 

  55. Minack, E., Sauermann, L., Grimnes, G., Fluit, C.: The sesame lucenesail: Rdf queries with full-text search. Techinical report, NEPOMUK Consortium, (2008)

    Google Scholar 

  56. Neumann, T., Weikum, G.: RDF-3X: a RISC-style engine for RDF. VLDB. Auckland, New Zealand (2008)

    Google Scholar 

  57. Neumann, T., Weikum, G.: x-RDF-3X: fast querying, high update rates, and consistency for RDF databases. Proc. VLDB Endow. 3, 256–263 (2010)

    Article  Google Scholar 

  58. Oren, E., Kotoulas, S., Anadiotis, G., Siebes, R., ten Teije, A., van Harmelen, F.: Marvin: distributed reasoning over large-scale semantic web data. J. Web Semant. 7(4), 305–316 (2009)

    Article  Google Scholar 

  59. Prud’hommeaux, E., Seaborne, A.: SPARQL query language for RDF. W3C Recommendation (2008)

    Google Scholar 

  60. Ramanan, P.: Covering indexes for XML queries: bisimulation – simulation – negation. Proceedings of the 29th International Conference on very Large Data Bases – Volume 29, VLDB ’2003, pp. 165–176. VLDB Endowment (2003)

    Google Scholar 

  61. Rao, P., Moon, B.: Sequencing XML data and query twigs for fast pattern matching. ACM Trans. Database Syst. 31, 299–345 (2006)

    Article  Google Scholar 

  62. Rohloff, K., Dean, M., Emmons, I., Ryder, D., Sumner, J.: An evaluation of triple-store technologies for large data stores. OTM 2007 Workshop SSWS, pp. 1105–1114. Vilamoura, Portugal (2007)

    Google Scholar 

  63. Rohloff, K., Schantz, R.E.: Clause-iteration with mapreduce to scalably query datagraphs in the shard graph-store. Proceedings of the Fourth International Workshop on Data-Intensive Distributed Computing, DIDC ’11, pp. 35–44. ACM, New York, NY, USA (2011)

    Google Scholar 

  64. Sakr, S., Al-Naymat, G.: Relational processing of RDF queries: a survey. ACM SIGMOD Record 38, 23–28 (2009). http://www.sigmod.org/publications/sigmod-record/0912

  65. Schmidt, M., Hornung, T., Küchlin, N., Lausen, G., Pinkel, C.: An experimental comparison of RDF data management approaches in a SPARQL benchmark scenario. ISWC, pp. 82–97. Karlsruhe (2008)

    Google Scholar 

  66. Schmidt, M., Hornung, T., Lausen, G., Pinkel, C.: \({\mathrm{SP}}^{2}\mathrm{Bench}\): a SPARQL performance benchmark. IEEE ICDE. Shanghai (2009)

    Google Scholar 

  67. Sidirourgos, L., Goncalves, R., Kersten, M., Nes, N., Manegold, S.: Column-store support for RDF data management: not all swans are white. Proc. VLDB Endow. 1, 1553–1563 (2008)

    Article  Google Scholar 

  68. Sintek, M., Kiesel, M.: RDFBroker: a signature-based high-performance RDF store. ESWC, pp. 363–377. Budva, Montenegro (2006)

    Google Scholar 

  69. Theoharis, Y., Christophides, V., Karvounarakis, G.: Benchmarking database representations of RDF/S stores. In: Gil Y., Motta E., Benjamins V.R., Musen M.A. (eds.) International semantic web conference. Lecture Notes in Computer Science, vol. 3729, pp. 685–701. Springer, Berlin, Heidelberg, New York (2005)

    Google Scholar 

  70. Tran, T., Ladwig, G.: Structure index for RDF data. Workshop on Semantic Data Management (SemData@ VLDB) (2010)

    Google Scholar 

  71. Tran, T., Wang, H., Rudolph, S., Cimiano, P.: Top-k exploration of query candidates for efficient keyword search on graph-shaped (RDF) data. ICDE, pp. 405–416. Shanghai (2009)

    Google Scholar 

  72. Tummarello, G., Cyganiak, R., Catasta, M., Danielczyk, S., Delbru, R., Decker, S.: Sig.ma: live views on the web of data. Web Semant. Sci. Serv. Agents World Wide Web 8(4), 355–364 (2010). DOI 10.1016/j.websem.2010.08.003

    Article  Google Scholar 

  73. Udrea, O., Pugliese, A., Subrahmanian, V.S.: GRIN: a graph based RDF index. AAAI, pp. 1465–1470. Vancouver, B.C. (2007)

    Google Scholar 

  74. Valduriez, P.: Join indices. ACM Trans. Database Syst. 12, 218–246 (1987)

    Google Scholar 

  75. W3C SWEO Community Project: Linking open data. http://www.w3.org/wiki/SweoIG/TaskForces/CommunityProjects/LinkingOpenData, 10 March 2012

  76. Wang, H., Liu, Q., Penin, T., Fu, L., Zhang, L., Tran, T., Yu, Y., Pan, Y.: Semplore: a scalable IR approach to search the web of data. Web Semant. Sci. Serv. Agents World Wide Web 7(3), 177–188 (2009)

    Article  Google Scholar 

  77. Wang, X., Wang, S., Pufeng, D., Zhiyong, F.: Path summaries and path partitioning in modern XML databases. Int. J. Modern Edu. Comput. Sci. 3, 55–61 (2011)

    Google Scholar 

  78. Weikum, G., Theobald, M.: From information to knowledge: harvesting entities and relationships from web sources. PODS, pp. 65–76. Indianapolis (2010)

    Google Scholar 

  79. Weiss, C., Karras, P., Bernstein, A.: Hexastore: sextuple indexing for semantic web data management. VLDB, Auckland, New Zealand (2008)

    Google Scholar 

  80. Wilkinson, K.: Jena property table implementation. SSWS, pp. 35–46. Athens, Georgia, USA (2006)

    Google Scholar 

  81. Witten, I., Moffat, A., Bell, T.: Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kaufmann, Los Altos, CA (1999)

    Google Scholar 

  82. Wood, D., Gearon, P., Adams, T.: Kowari: a platform for semantic web storage and analysis. XTech. Amsterdam (2005)

    Google Scholar 

  83. Wu, G., Li, J.: Managing large scale native RDF semantic repository from the graph model perspective. ACM SIGMOD Workshop IDAR, pp. 85–86. Beijing (2007)

    Google Scholar 

  84. Wu, Y., Gucht, D.V., Gyssens, M., Paredaens, J.: A study of a positive fragment of path queries: Expressiveness, normal form and minimization. Comput. J. 54(7), 1091–1118 (2011)

    Article  Google Scholar 

  85. Yang, M., Wu, G.: Caching intermediate result of sparql queries. In:  Srinivasan, S.,  Ramamritham, K., Kumar, A., Ravindra, M.P.,  Bertino, E., Kumar, R. (eds.) WWW (Companion Volume), pp. 159–160. ACM, New York (Los Angeles, CA) (2011)

    Google Scholar 

  86. Zou, L., Mo, J., Chen, L., Özsu, M.T., Zhao, D.: gStore: answering SPARQL queries via subgraph matching. Proc. VLDB Endowment 4(8), 482–493 (2011)

    Google Scholar 

Download references

Acknowledgements

The research of FP is supported by an FNRS/FRIA scholarship. The research of SV is supported by the OSCB project funded by the Brussels Capital Region. The research of GF, JH, and YL is supported by the Netherlands Organisation for Scientific Research (NWO).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yongming Luo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Luo, Y., Picalausa, F., Fletcher, G.H.L., Hidders, J., Vansummeren, S. (2012). Storing and Indexing Massive RDF Datasets. In: De Virgilio, R., Guerra, F., Velegrakis, Y. (eds) Semantic Search over the Web. Data-Centric Systems and Applications. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25008-8_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-25008-8_2

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-25007-1

  • Online ISBN: 978-3-642-25008-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics