The RDF-3X engine for scalable management of RDF data

Neumann, Thomas; Weikum, Gerhard

doi:10.1007/s00778-009-0165-y

The RDF-3X engine for scalable management of RDF data

Special Issue Paper
Published: 01 September 2009

Volume 19, pages 91–113, (2010)
Cite this article

The VLDB Journal Aims and scope Submit manuscript

Thomas Neumann¹ &
Gerhard Weikum¹

1569 Accesses
403 Citations
6 Altmetric
Explore all metrics

Abstract

RDF is a data model for schema-free structured information that is gaining momentum in the context of Semantic-Web data, life sciences, and also Web 2.0 platforms. The “pay-as-you-go” nature of RDF and the flexible pattern-matching capabilities of its query language SPARQL entail efficiency and scalability challenges for complex queries including long join paths. This paper presents the RDF-3X engine, an implementation of SPARQL that achieves excellent performance by pursuing a RISC-style architecture with streamlined indexing and query processing. The physical design is identical for all RDF-3X databases regardless of their workloads, and completely eliminates the need for index tuning by exhaustive indexes for all permutations of subject-property-object triples and their binary and unary projections. These indexes are highly compressed, and the query processor can aggressively leverage fast merge joins with excellent performance of processor caches. The query optimizer is able to choose optimal join orders even for complex queries, with a cost model that includes statistical synopses for entire join paths. Although RDF-3X is optimized for queries, it also provides good support for efficient online updates by means of a staging architecture: direct updates to the main database indexes are deferred, and instead applied to compact differential indexes which are later merged into the main indexes in a batched manner. Experimental studies with several large-scale datasets with more than 50 million RDF triples and benchmark queries that include pattern matching, manyway star-joins, and long path-joins demonstrate that RDF-3X can outperform the previously best alternatives by one or two orders of magnitude.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Abadi, D.J., Marcus, A., Madden, S., Hollenbach, K.J.: Scalable semantic web data management using vertical partitioning. In: VLDB, pp. 411–422 (2007)
Alexaki, S., Christophides, V., Karvounarakis, G., Plexousakis, D., Tolle, K.: The ICS-FORTH RDFSuite: Managing voluminous RDF description bases. In: SemWeb (2001)
Anyanwu, K., Maduko, A., Sheth, A.P.: SPARQ2L: towards support for subgraph extraction queries in rdf databases. In: WWW, pp. 797–806 (2007)
Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.G.: Dbpedia: a nucleus for a web of open data. In: ISWC/ASWC, pp. 722–735 (2007)
Bayardo, R.J., Jr.: Efficiently mining long patterns from databases. In: SIGMOD, pp. 85–93 (1998)
Baolin, L., Bo, H.: Path queries based RDF index (2005)
Baolin, L., Bo, H.: HPRD: A high performance RDF database. In: NPC, pp. 364–374 (2007)
BioPAX: Biological Pathways Exchange. http://www.biopax.org/
Broekstra, J., Kampman, A., van Harmelen, F.: Sesame: an architecture for storing and querying RDF data and schema information. In: Spinning the Semantic Web, pp. 197–222 (2003)
C-Store. http://db.csail.mit.edu/projects/cstore/
Chaudhuri F., Shim K.: Optimization of queries with user-defined predicates. ACM Trans. Database Syst. 24(2), 177–228 (1999)
Article Google Scholar
Chaudhuri, S., Weikum, G.: Rethinking database system architecture: towards a self-tuning RISC-style database system. In: VLDB, pp. 1–10 (2000)
Chong, E.I., Das, S., Eadon, G., Srinivasan, J.: An efficient SQL-based RDF querying scheme. In: VLDB, pp. 1216–1227 (2005)
Chu, E., Beckmann, J.L., Naughton, J.F.: The case for a wide-table approach to manage sparse relational data sets. In: SIGMOD, pp. 821–832 (2007)
DeHaan, D., Tompa, F.W.: Optimal top-down join enumeration. In: SIGMOD, pp. 785–796 (2007)
den Bercken, J.V., Seeger, B.: An evaluation of generic bulk loading techniques. In: VLDB, pp. 461–470 (2001)
Eickler, A., Gerlhof, C.A., Kossmann, D.: A performance evaluation of OID mapping techniques. In: Dayal, U., Gray, P.M.D., Nishio, S. (eds.) VLDB, pp. 18–29. Morgan Kaufmann (1995)
Galindo-Legaria, C.A., Pellenkoft, A., Kersten, M.L.: Fast, randomized join-order selection—why use transformations? In: Bocca, J.B., Jarke, M., Zaniolo, C. (eds) VLDB, pp. 85–95. Morgan Kaufmann (1994)
Getoor L., Diehl C.P.: Link mining: a survey. SIGKDD Explor. 7(2), 3–12 (2005)
Article Google Scholar
Graefe G.: Query evaluation techniques for large databases. ACM Comput. Surv. 25(2), 73–170 (1993)
Article Google Scholar
Halevy, A.Y., Franklin, M.J., Maier, D.: Principles of dataspace systems. In: PODS, pp. 1–9 (2006)
Hart T.E., McKenney P.E., Brown A.D., Walpole J.: Performance of memory reclamation for lockless synchronization. J. Parall. Distrib. Comput. 67(12), 1270–1285 (2007)
Google Scholar
Harth, A., Umbrich, J., Hogan, A., Decker, S.: YARS2: A federated repository for querying graph structured data from the web. In: ISWC/ASWC, pp. 211–224 (2007)
Hartig, O., Heese, R.: The SPARQL query graph model for query optimization. In: ESWC, pp. 564–578 (2007)
Hogan, A., Harth, A.: The ExpertFinder corpus 2007 for the benchmarking development of expert-finding systems. In: International ExpertFinder Workshop (2007)
Huynh D., Mazzocchi S., Karger D.R.: Piggy bank: experience the semantic web inside your web browser. J. Web Sem. 5(1), 16–27 (2007)
Google Scholar
ICS-FORTH RDF suite. http://athena.ics.forth.gr:9090/RDF/
Jena: a Semantic Web Framework for Java. http://jena.sourceforge.net/
Jermaine C.M., Omiecinski E., Yee W.G.: The partitioned exponential file for database storage management. VLDB J. 16(4), 417–437 (2007)
Article Google Scholar
Kersten, M., Siebes, A.P.: An organic database system. Technical report, CWI (1999)
LibraryThing. http://www.librarything.com
Lomet D.B., Salzberg B.: Concurrency and recovery for index trees. VLDB J. 6(3), 224–240 (1997)
Article Google Scholar
Maduko, A., Anyanwu, K., Sheth, A.P., Schliekelman, P.: Estimating the cardinality of RDF graph patterns. In: WWW, pp. 1233–1234 (2007)
Maduko, A., Anyanwu, K., Sheth, A.P., Schliekelman, P.: Graph summaries for subgraph frequency estimation. In: ESWC, pp. 508–523 (2008)
Moerkotte, G., Neumann, T.: Analysis of two existing and one new dynamic programming algorithm for the generation of optimal bushy join trees without cross products. In: VLDB, pp. 930–941 (2006)
MonetDB. http://monetdb.cwi.nl/
Muth P., O’Neil P.E., Pick A., Weikum G.: The LHAM log-structured history data access method. VLDB J. 8(3–4), 199–221 (2000)
Google Scholar
Neumann, T., Moerkotte, G.: An efficient framework for order optimization. In: ICDE, pp. 461–472 (2004)
Neumann T., Weikum G.: RDF-3X: a RISC-style engine for RDF. PVLDB 1(1), 647–659 (2008)
Google Scholar
O’Neil P.E., Cheng E., Gawlick D., O’Neil E.J.: The log-structured merge-tree (LSM-tree). Acta Inf. 33(4), 351–385 (1996)
Article Google Scholar
OpenRDF. http://www.openrdf.org/index.jsp
Oracle technical network, semantic technologies center. http://www.oracle.com/technology/tech/semantic_technologies/index.html
RDF-3X. http://www.mpi-inf.mpg.de/~neumann/rdf3x
RDFizers. http://simile.mit.edu/wiki/RDFizers
Schmidt, M., Hornung, T., Knchlin, N., Lausen, G., Pinkel, C.: An experimental comparison of RDF data management approaches in a SPARQL benchmark scenario. In: International Semantic Web Conference, pp. 82–97 (2008)
Scholer, F., Williams, H.E., Yiannis, J., Zobel, J.: Compression of inverted indexes for fast query evaluation. In: SIGIR, pp. 222–229 (2002)
Sears R., Callaghan M., Brewer E.: Rose: compressed, log-structured replication. PVLDB 1(1), 526–537 (2008)
Google Scholar
Selinger, P.G., Astrahan, M.M., Chamberlin, D.D., Lorie, R.A., Price, T.G.: Access path selection in a relational database management system. In: Bernstein, P.A. (ed.) SIGMOD, pp. 23–34. ACM (1979)
Semantic web challenge. http://challenge.semanticweb.org
Sidirourgos L., Goncalves R., Kersten M.L., Nes N., Manegold S.: Column-store support for RDF data management: not all swans are white. PVLDB 1(2), 1553–1563 (2008)
Google Scholar
Simmen, D.E., Shekita, E.J., Malkemus, T.: Fundamental techniques for order optimization. In: SIGMOD, pp. 57–67 (1996)
Steinbrunn, M., Peithner, K., Moerkotte, G., Kemper, A.: Bypassing joins in disjunctive queries. In: Dayal, U., Gray, P.M.D., Nishio, S. (eds.) VLDB, pp. 228–238. Morgan Kaufmann (1995)
Stocker, M., Seaborne, A., Bernstein, A., Kiefer, C., Reynolds, D.: Sparql basic graph pattern optimization using selectivity estimation. In: WWW, New York, NY, USA, April 2008. ACM Press, to appear
Stonebraker, M., Bear, C., Çetintemel, U., Cherniack, M., Ge, T., Hachem, N., Harizopoulos, S., Lifter, J., Rogers, J., Zdonik, S.B.: One size fits all? part 2: Benchmarking studies. In: CIDR, pp. 173–184 (2007)
Suchanek F.M., Kasneci G., Weikum G.: Yago: a large ontology from wikipedia and wordNet. J. Web Sem. 6(3), 203–217 (2008)
Google Scholar
Theoharis, Y., Christophides, V., Karvounarakis, G.: Benchmarking database representations of rdf/s stores. In: International Semantic Web Conference, pp. 685–701 (2005)
Udrea, O., Pugliese, A., Subrahmanian, V.S.: GRIN: A graph based RDF index. In: AAAI, pp. 1465–1470 (2007)
uniprot RDF. http://dev.isb-sib.ch/projects/uniprot-rdf/
Vanetik, N., Gudes, E.: Mining frequent labeled and partially labeled graph patterns. In: ICDE, pp. 91–102 (2004)
Weiss C., Karras P., Bernstein A.: Hexastore: sextuple indexing for semantic web data management. PVLDB 1(1), 1008–1019 (2008)
Google Scholar
W3C: Resource Description Framework (RDF). http://www.w3.org/RDF/.
W3C: SPARQL Query Language for RDF. http://www.w3.org/TR/rdf-sparql-query/
Westmann T., Kossmann D., Helmer S., Moerkotte G.: The implementation and performance of compressed databases. SIGMOD Rec. 29(3), 55–67 (2000)
Article Google Scholar
Wilkinson, K., Sayers, C., Kuno, H.A., Reynolds, D.: Efficient RDF storage and retrieval in Jena2. In: SWDB, pp. 131–150 (2003)
W3C: RDF/OWL representation of WordNet. http://www.w3.org/TR/wordnet-rdf/
Yars2. http://sw.deri.org/svn/sw/2004/06/yars
Zhu, F., Yan, X., Han, J., Yu, P.S.: gPrune: A constraint pushing framework for graph pattern mining. In: PAKDD, pp. 388–400 (2007)
Zobel, J., Moffat, A.: Inverted files for text search engines. ACM Comput. Surv. 38(2). http://doi.acm.org/10.1145/1132956.1132959 (2006)

Download references

Author information

Authors and Affiliations

Max-Planck-Institut für Informatik, Saarbrücken, Germany
Thomas Neumann & Gerhard Weikum

Authors

Thomas Neumann
View author publications
You can also search for this author in PubMed Google Scholar
Gerhard Weikum
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thomas Neumann.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Neumann, T., Weikum, G. The RDF-3X engine for scalable management of RDF data. The VLDB Journal 19, 91–113 (2010). https://doi.org/10.1007/s00778-009-0165-y

Download citation

Received: 13 January 2009
Revised: 19 June 2009
Accepted: 07 August 2009
Published: 01 September 2009
Issue Date: February 2010
DOI: https://doi.org/10.1007/s00778-009-0165-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The RDF-3X engine for scalable management of RDF data

Abstract

Access this article

Similar content being viewed by others

The New Hardware Development Trend and the Challenges in Data Management and Analysis

A survey on the evolution of stream processing systems

MongoDB Vs PostgreSQL: A comparative study on performance aspects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

The RDF-3X engine for scalable management of RDF data

Abstract

Access this article

Similar content being viewed by others

The New Hardware Development Trend and the Challenges in Data Management and Analysis

A survey on the evolution of stream processing systems

MongoDB Vs PostgreSQL: A comparative study on performance aspects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation