Abstract
Different algorithms exist to compute the result of a logical operator like AND, OPT, or SORT. A physical operator implements one of the algorithms to compute the result of a logical operator. The different physical operators sometimes have different constraints on the input data like that the input data must be sorted, or are faster than others for special types of input data, for example, when the input data fit into main memory. The context of an operator can be described by the estimations of properties of its input data. For each (logical) operator in the operatorgraph, physical optimization aims to choose the physical operator with the best estimated execution times in the operator’s context.
As well as describing the physical operators, we in this chapter present our new approaches to efficient RDF data management and join optimization for small datasets and for large-scale datasets with over one billion triples.
For small datasets, where the data can be indexed in main memory, in-memory indices can significantly speed up query processing because (after loading the data) no disk accesses need to be done for query processing. B+-trees are optimized for disk indices of large-scale datasets, as they are optimized for blockwise sequential accesses of disks. For main-memory indices, hash indices are preferable as an index access can be done in constant time, as only a hash function must be applied to the key to retrieve the (main memory) address of the indexed element. Therefore, we use hash indices to manage small RDF datasets. Based on the triple nature of RDF data, we create seven hash indices in order to retrieve in-memory RDF data quickly. On the basis of the SPARQL-specific properties and the seven indices, we develop a new, efficient approach to computing join by dynamically restricting triple patterns. A performance evaluation demonstrates that the new approach outperforms other state-of-the-art in-memory databases.
Since the Semantic Web datasets are becoming increasingly large, developing efficient techniques to speeding up querying large-scale Semantic Web data is a key issue for Semantic Web applications. When data are already sorted, from relational database research, merge joins are known to be the fastest join algorithms on large-scale data. Therefore, recent approaches focus on the presorting of Semantic Web data during index construction, and thus the fast merge join can be used without a sorting phase at runtime for some joins. When data for succeeding joins become unsorted, the hash join is typically used. In this chapter, we propose a sorting numbering scheme for large RDF datasets, based on which we can fast sort any intermediate and final querying results. Applying our sorting numbering scheme, all joins can be computed using the merge join with a fast sorting phase. Besides being a significant benefit to merge joins, our fast sorting technique can also remarkably speed up the elimination of duplicates. Our experiments show that a merge join using our fast sorting technique outperforms greatly the hash join and that our sorting numbering scheme integrated into any index approaches significantly speeds up querying large-scale Semantic Web data.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abadi, D.J., Marcus, A., Madden, S.R., Hollenbach, K.: Scalable semantic web data management using vertical partitioning, VLDB, Vienna, Austria (2007)
Angles, R., Gutiérrez C.: Querying RDF data from a graph database perspective. In: ESWC (2005)
Auer, S., et al.: Dbpedia: a nucleus for a web of open data. In: ISWC/ASWC (2007)
Beged-Dove, G., Brickley, D., Dornfest, R., Davis, I., Dodds, L., Eisenzopf, J., Galbraith, D., Guha, R.V., MacLeod, K., Miller, E., Swartz, A., van der Vlist, E.: RDF site summary (RSS) 1.0, http://purl.org/rss/1.0/spec (2001)
Bernstein, A., Stocker, M., Kiefer, C.: SPARQL query optimization using selectivity estimation. ISWC (2007)
Brickley, D., Miller L.: FOAF vocabulary specification 0.9, http://xmlns.com/foaf/spec (2007)
Broekstra, J., Kampman, A., van Harmelen, F.: Sesame: A generic architecture for storing and querying RDF and RDF schema. In: International Semantic Web Conference 2002, Chia, Sardinai, Italy (2002)
Chong, E.I., Das, S., Eadon, G., Srinivasan, J.: An efficient SQL-based RDF querying scheme, VLDB (2005)
Elmasri, R., Navathe, S.B.: Fundamentals of database systems, 3rd edn, Addison Wesley (2000)
Feigenbaum, L., (ed): DAWG Testcases, http://www.w3.org/2001/sw/DataAccess/tests/r2, 2008.
Friend, E.H.: Sorting on electronic computer systems. J ACM 3(3) (1956)
Garcia-Molina, H., Ullman, J.D., Widom, J.: Database Systems: The Complete Book. Prentice Hall, Upper Saddle River, NJ (2002)
Groppe, S., Groppe, J.: LUPOSDATE demonstration, http://www.ifis.uni-luebeck.de/index.php?id=luposdate-demo (2009)
Groppe, J., Groppe, S., Ebers, S., Linnemann, V.: Efficient Processing of SPARQL joins in memory by dynamically restricting triple patterns, ACM SAC, Waikiki Beach, Honolulu, Hawaii, USA (2009)
Groppe, J., Groppe, S., Schleifer, A., Linnemann, V.: LuposDate: A semantic web database system. In: 18th ACM conference on information and knowledge management (ACM CIKM 2009), Hong Kong, China (2009)
Groppe, S, Groppe, J.: External sorting for index construction of large semantic web databases. In: 25th Symposium On Applied Computing (ACM SAC 2010), Sierre, Switzerland (2010)
Guha, R.V.: rdfDB : An RDF database. http://www.guha.com/rdfdb/ (2010)
Guo, Y., Pan, Z., Heflin, J.: LUBM: A benchmark for OWL knowledge base systems. Web Semantics 3(2) (2005)
Harris, S., Gibbins, N.: 3store: Efficient bulk RDF storage. In: PSSS. (2003)
Harth, A., Decker, S.: Optimized index structure for querying RDF from the web. In: Proceedings of the 3rd Latin American Web Congress (LA-WEB), Buenos Aires, Argentina (2005)
Hayes, J., Gutiérrez C.: Bipartite graphs as intermediate model for RDF. In: ISWC, (2004)
Kim, Y., Kim, B., Lee, J., Lim, H.: The path index for query processing on RDF and RDF Schema. ICACT. (2005)
Knuth, D.E.: Sorting and searching, vol. 3 of The art of computer programming, 2nd edn. Reading, MA: Addison-Wesley (1998)
Ley, M.: The DBLP computer science bibliography. http://www.informatik.uni-trier.de/~ley/db/ (2010)
Liarou, E., Idreos, S., Koubarakis, M.: Continuous RDF query processing over DHTs. In: ISWC. (2007)
Matono, A., Yoshikawa, A.T., Uemura, S.: An indexing scheme for RDF and RDF schema based on Suffix Arrays. SWDB’03 co-located with VLDB 2003, Berlin (2003)
Matono, A., Amagasa, T., Yoshikawa, M., Uemura, S.: A path-based relational RDF database. In: ADC, (2005)
Neumann, T., Weikum, G.: Scalable join processing on very large RDF graphs, SIGMOD (2009)
Neumann, T., Weikum, G.: RDF3X: a RISCstyle engine for RDF. In: Proceedings of the 34th International Conference on Very Large Data Bases (VLDB). Auckland, New Zealand (2008)
Pan, Z., Heflin, J.: DLDB: Extending relational databases to support Semantic Web queries. In: PSSS. (2003)
Piatetsky-Shapiro, G., Connell, C.: Accurate estimation of the number of tuples satisfying a condition. SIGMOD, (1984)
Prud’hommeaux, E., Seaborne, A.: SPARQL Query Language for RDF, W3C Recommendation, (2008)
Schmidt, M., Hornung, T., Lausen, G., Pinkel, C.: SP2Bench: A SPARQL performance benchmark, ICDE. Shanghai, China (2009)
Stonebraker, M., Abadi, D.J., Batkin, A., Chen, X., Cherniack, M., Ferreira, M., Lau, E., Lin, A., Madden, S.R., O’Neil, E.J., O’Neil, P.E., Rasin, A., Tran, N., Zdonik, S.B.: C-store: a column-oriented DBMS. In: VLDB, (2005)
Swiss Institute of Bioinformatics, uniprot RDF, http://dev.isb-sib.ch/projects/uniprot-rdf/ (2009)
van Assem, M., Gangemi, A., Schreiber, G.: RDF/OWL Representation of WordNet, W3C Working Draft, 2006. http://www.w3.org/TR/wordnet-rdf/
Volz, R., Oberle, D., Staab, S., Motik, B.: KAON SERVER - A Semantic Web Management System. In: WWW (2003)
Weiss, C., Karras, P., Bernstein, A.: Hexastore: sextuple indexing for semantic web data management. VLDB (2008)
Wilkinson K.: Jena property table implementation. In: SSWS (2006)
Wilkinson, K., Sayers, C., Kuno, H., Reynolds, D.: Efficient RDF Storage and Retrieval in Jena2. In: Workshop on Semantic Web and Databases. Berlin, Germany (2003)
Wood, D., Gearon, P., Adams, T.: Kowari: A platform for Semantic Web storage and analysis. In: XTech, (2005)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Groppe, S. (2011). Physical Optimization. In: Data Management and Query Processing in Semantic Web Databases. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19357-6_6
Download citation
DOI: https://doi.org/10.1007/978-3-642-19357-6_6
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19356-9
Online ISBN: 978-3-642-19357-6
eBook Packages: Computer ScienceComputer Science (R0)