Advertisement

The VLDB Journal

, Volume 18, Issue 2, pp 385–406 | Cite as

SW-Store: a vertically partitioned DBMS for Semantic Web data management

  • Daniel J. Abadi
  • Adam Marcus
  • Samuel R. Madden
  • Kate Hollenbach
Special Issue Paper

Abstract

Efficient management of RDF data is an important prerequisite for realizing the Semantic Web vision. Performance and scalability issues are becoming increasingly pressing as Semantic Web technology is applied to real-world applications. In this paper, we examine the reasons why current data management solutions for RDF data scale poorly, and explore the fundamental scalability limitations of these approaches. We review the state of the art for improving performance of RDF databases and consider a recent suggestion, “property tables”. We then discuss practically and empirically why this solution has undesirable features. As an improvement, we propose an alternative solution: vertically partitioning the RDF data. We compare the performance of vertical partitioning with prior art on queries generated by a Web-based RDF browser over a large-scale (more than 50 million triples) catalog of library data. Our results show that a vertically partitioned schema achieves similar performance to the property table technique while being much simpler to design. Further, if a column-oriented DBMS (a database architected specially for the vertically partitioned case) is used instead of a row-oriented DBMS, another order of magnitude performance improvement is observed, with query times dropping from minutes to several seconds. Encouraged by these results, we describe the architecture of SW-Store, a new DBMS we are actively building that implements these techniques to achieve high performance RDF data management.

Keywords

Query Time Property Table Query Plan Path Expression Vertical Partition 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Abadi, D., Marcus, A., Madden, S., Hollenbach, K.: Using the Barton libraries dataset as an RDF benchmark. Technical Report MIT-CSAIL-TR-2007-036, MIT Press, USAGoogle Scholar
  2. 2.
    Abadi, D.J.: Column stores for wide and sparse data. In: CIDR (2007)Google Scholar
  3. 3.
    Abadi, D.J.: Query execution in column-oriented database systems. PhD Dissertation, PhD Thesis, MIT Press, USA (2008)Google Scholar
  4. 4.
    Abadi, D.J., Madden, S., Ferreira, M.: Integrating compression and execution in column-oriented database systems. In: SIGMOD (2006)Google Scholar
  5. 5.
    Abadi, D.J., Madden, S.R., Hachem, N.: Column-stores vs. row-stores: How different are they really? In: SIGMOD (2008)Google Scholar
  6. 6.
    Abadi, D.J., Myers, D.S., DeWitt, D.J., Madden, S.R.: Materialization strategies in a column-oriented DBMS. In: Proceedings of ICDE (2007)Google Scholar
  7. 7.
    Agrawal, R., Somani, A., Xu, Y.: Storage and querying of E-commerce data. In: VLDB (2001)Google Scholar
  8. 8.
    Ailamaki, A., DeWitt, D.J., Hill, M.D., Skounakis, M.: Weaving relations for cache performance. In: VLDB, pp. 169–180 (2001)Google Scholar
  9. 9.
    Alexaki, S., Christophides, V., Karvounarakis, G., Plexousakis, D., Tolle, K.: The ICS-FORTH RDFSuite: managing voluminous RDF description bases. In: SemWeb (2001)Google Scholar
  10. 10.
    Bajda-Pawlikowski, K.: Querying RDF data stored in DBMS: SPARQL to SQL Conversion. Technical Report TR-1409, Yale Computer Science Department, USAGoogle Scholar
  11. 11.
    Batory D.S.: On searching transposed files. ACM Trans. Database Syst. 4(4), 531–544 (1979)CrossRefGoogle Scholar
  12. 12.
    Beckmann, J., Halverson, A., Krishnamurthy, R., Naughton, J.: Extending RDBMSs to support sparse datasets using an interpreted attribute storage format. In: ICDE (2006)Google Scholar
  13. 13.
    Bertino E., Kim W.: Indexing techniques for queries on nested objects. IEEE Trans. Knowl. Data Eng. 1(2), 196–214 (1989)CrossRefGoogle Scholar
  14. 14.
    Boncz, P., Manegold, S., Kersten, M.: Database architecture optimized for the new bottleneck: memory access. In: VLDB, pp. 54–65 (1999)Google Scholar
  15. 15.
    Boncz P.A., Kersten M.L.: MIL primitives for querying a fragmented world. VLDB J. 8(2), 101–119 (1999)CrossRefGoogle Scholar
  16. 16.
    Boncz, P.A., Zukowski, M., Nes, N.: MonetDB/X100: hyper-pipelining query execution. In: CIDR, pp. 225–237 (2005)Google Scholar
  17. 17.
    Bonstrom, V., Hinze, A., Schweppe, H.: Storing RDF as a graph. In: Proceedings of LA-WEB (2003)Google Scholar
  18. 18.
    Broekstra, J., Kampman, A., van Harmelen, F.: Sesame: a generic architecture for storing and querying RDF and RDF schema. In: ISWC, pp. 54–68 (2002)Google Scholar
  19. 19.
    Chong, E.I., Das, S., Eadon, G., Srinivasan, J.: An efficient SQL-based RDF querying scheme. In: VLDB, pp. 1216–1227 (2005)Google Scholar
  20. 20.
    Copeland, G.P., Khoshafian, S.N.: A decomposition storage model. In: Proceedings of SIGMOD, pp. 268–279 (1985)Google Scholar
  21. 21.
    Corwin J., Silberschatz A., Miller P.L., Marenco L.: Dynamic tables: An architecture for managing evolving, heterogeneous biomedical data in relational database management systems. J. Am. Med. Inf. Assoc. 14(1), 86–93 (2007)CrossRefGoogle Scholar
  22. 22.
    Falcons. Searching the semantic web. Web page. http://iws.seu.edu.cn/services/falcons/objectsearch/index.jsp/
  23. 23.
    Florescu D., Kossmann D.: Storing and querying XML data using an RDMBS. IEEE Data Eng. Bull. 22(3), 27–34 (1999)Google Scholar
  24. 24.
    Harris, S., Gibbins, N.: 3store: efficient bulk RDF storage. In: Proceedings of PSSS’03, pp. 1–15 (2003)Google Scholar
  25. 25.
    Hellerstein, J.M., Naughton, J.F., Pfeffer, A.: Generalized search trees for database systems. In: Proceedings of VLDB, pp. 562–573. Zurich (1995)Google Scholar
  26. 26.
    Howe, B., Maier, D., Rayner, N., Rucker, J.: Quarrying dataspaces: schemaless profiling of unfamiliar information sources. In: Proceedings of the workshop on information integration methods, architectures, and systems (IIMAS) (2008)Google Scholar
  27. 27.
    Kemper A., Moerkotte G.: Access support relations: an indexing method for object bases. Inf. Syst. 17(2), 117–145 (1992)zbMATHCrossRefGoogle Scholar
  28. 28.
  29. 29.
  30. 30.
    Lu, J., Cao, F., Ma, L., Yu, Y., Pan, Y.: An Effective SPARQL support over relational databases. In: Proceedings of the joint ODBIS/SWDB workshop on semantic web, ontologies, and databases (2007)Google Scholar
  31. 31.
    Lu, J., Ma, L., Zhang, L., Brunner, J.-S., Wang, C., Pan, Y., Yu, Y.: SOR: A practical system for ontology storage, reasoning and search. In: Proceedings of VLDB, pp. 1402–1405 (2007)Google Scholar
  32. 32.
    Lu, J., Wang, C., Ma, L., Yu, Y., Pan, Y.: Performance and scalability evaluation of practical ontology systems. In: Proceedings of the joint ODBIS/SWDB workshop on semantic web, ontologies, and databases (2007)Google Scholar
  33. 33.
    MacNicol, R., French, B.: Sybase IQ multiplex—designed for analytics. In: VLDB pp. 1227–1230 (2004)Google Scholar
  34. 34.
    Metaweb: Freebase parallax. Web page. http://mqlx.com/~david/parallax/
  35. 35.
    Milo, T., Suciu, D.: Index structures for path expressions. In: Proceedings of ICDT, pp. 277–295 (1999)Google Scholar
  36. 36.
    Olofson, C.: Worldwide rdbms 2005 vendor shares. Technical report 201692, IDC, USA (2006)Google Scholar
  37. 37.
    Redland RDF application framework. http://librdf.org/
  38. 38.
    RDF Primer. W3C Recommendation. http://www.w3.org/TR/rdf-primer (2004)
  39. 39.
    RDQL—A Query Language for RDF. W3C Member Submission 9 January 2004. http://www.w3.org/Submission/RDQL/, 2004
  40. 40.
    Simile website. http://simile.mit.edu/
  41. 41.
    SPARQL Query Language for RDF. W3C Working Draft 4 October 2006. http://www.w3.org/TR/rdf-sparql-query/, 2006
  42. 42.
    Schmidt, M., Hornung, T., Kuchlin, N., Lausen, G., Pinkel, C.: An experimental comparison of RDF data management approaches in a SPARQL benchmark scenario. In: Proceedings of ISWC (2008)Google Scholar
  43. 43.
    Shanmugasundaram, J., Tufte, K., Zhang, C., He, G., DeWitt, D.J., Naughton, J.F.: Relational databases for querying XML documents: Limitations and opportunities. In: Proceedings of VLDB, pp. 302–314 (1999)Google Scholar
  44. 44.
    Sindice. The semantic web index. http://sindice.com/
  45. 45.
    Stonebraker, M., Abadi, D.J., Batkin, A., Chen, X., Cherniack, M., Ferreira, M., Lau, E., Lin, A., Madden, S., O’Neil, E.J., O’Neil, P.E., Rasin, A., Tran, N., Zdonik, S.B.: C-Store: a column-oriented DBMS. In: VLDB, pp. 553–564 (2005)Google Scholar
  46. 46.
    Swoogle: Semantic web search engine. http://swoogle.umbc.edu/
  47. 47.
    Theoharis, Y., Christophides, V., Karvounarakis, G.: Benchmarking database representations of RDF/S stores. In: Proceedings of ISWC (2005)Google Scholar
  48. 48.
  49. 49.
    Vesset, D.: Worldwide data warehousing tools 2005 vendor shares. Technical report 203229, IDC, USA (2006)Google Scholar
  50. 50.
    W3C SWEO Community Project: Linking open data on the semantic web. http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpen Data
  51. 51.
    World Wide Web Consortium (W3C). http://www.w3.org/
  52. 52.
    Weiss, C., Karras, P., Bernstein, A.: Hexastore: sextuple indexing for semantic web data management. In: Proceedings of VLDB (2008)Google Scholar
  53. 53.
    Wilkinson, K.: Jena property table implementation. In: SSWS (2006)Google Scholar
  54. 54.
    Wilkinson, K., Sayers, C., Kuno, H., Reynolds, D.: Efficient RDF storage and retrieval in Jena2. In: SWDB, pp. 131–150 (2003)Google Scholar

Copyright information

© Springer-Verlag 2009

Authors and Affiliations

  • Daniel J. Abadi
    • 1
  • Adam Marcus
    • 2
  • Samuel R. Madden
    • 2
  • Kate Hollenbach
    • 2
  1. 1.Yale UniversityNew HavenUSA
  2. 2.MITCambridgeUSA

Personalised recommendations