Skip to main content

Partitioned Indexes for Entity Search over RDF Knowledge Bases

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7238))

Abstract

The rapid growth of RDF data in RDF knowledge bases calls for efficient query processing techniques. This paper focuses on the star-style SPARQL join queries, which is very common when users want to search information of entities from RDF knowledge bases. We observe that the computational cost of such queries mainly comes from loading a large portion of predicate-ahead indexes. We therefore propose to partition the whole RDF knowledge bases based on the schema of individual entities, so that only entities of similar schemas are allocated into the same cluster. Such a partitioning strategy generates a pruning mechanism that effectively isolate the correlations of partitions and the queries. Consequently, queries are only conducted over a small number of partitions with small predicate-ahead indexes. Experiments over a large real-life RDF data set show the significant performance improvements achieved by our partitioned indexing techniques.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Billion Triple Challenge, http://challenge.semanticweb.org/

  2. Freebase, http://www.freebase.com/

  3. Linked Open Data, http://www.w3.org/wiki/SweoIG/TaskForces/CommunityProjects/LinkingOpenData

  4. OpenRDF, http://www.openrdf.org/index.jsp

  5. SPARQL, http://www.w3.org/TR/rdf-sparql-query/

  6. YAGO, http://www.mpi-inf.mpg.de/yago-naga/yago/

  7. Abadi, D.J., Marcus, A., Madden, S., Hollenbach, K.J.: Scalable Semantic Web Data Management Using Vertical Partitioning. In: 33th International Conference on Very Large Data Bases, pp. 411–422 (2007)

    Google Scholar 

  8. Bizer, C., Heath, T., Idehen, K., Berners-Lee, T.: Linked data on the web (LDOW 2008). In: 17th International Conference on World Wide Web, pp. 1265–1266 (2008)

    Google Scholar 

  9. Bröcheler, M., Pugliese, A., Subrahmanian, V.S.: DOGMA: A Disk-Oriented Graph Matching Algorithm for RDF Databases. In: Bernstein, A., Karger, D.R., Heath, T., Feigenbaum, L., Maynard, D., Motta, E., Thirunarayan, K. (eds.) ISWC 2009. LNCS, vol. 5823, pp. 97–113. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  10. Neumann, T., Weikum, G.: Scalable join processing on very large RDF graphs. In: 2008 ACM SIGMOD International Conference on Management of Data, pp. 627–640 (2009)

    Google Scholar 

  11. Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: a core of semantic knowledge Unifying WordNet and Wikipedia. In: 16th International Conference on World Wide Web, pp. 697–706 (2007)

    Google Scholar 

  12. Zemánek, J., Schenk, S.: Optimizing SPARQL Queries over Disparate RDF Data Sources through Distributed Semi-Joins. In: International Semantic Web Conference (Posters & Demos) (2008)

    Google Scholar 

  13. Harth, A., Decker, S.: Optimized Index Structures for Querying RDF from the Web. In: LA-WEB, pp. 71–80 (2005)

    Google Scholar 

  14. Broekstra, J., Kampman, A., van Harmelen, F.: Sesame: An architecture for storing and querying rdf data and schema information. Web Sem. 8(4), 271–277 (2010)

    Article  Google Scholar 

  15. Carroll, J.J., Dickinson, I., Dollin, C., Reynolds, D., Seaborne, A., Wilkinson, K.: Jena: Implementing the Semantic Web Recommendations. In: WWW Alt, pp. 74–83 (2004)

    Google Scholar 

  16. Atre, M., Chaoji, V., Zaki, M.J., Hendler, J.A.: Matrix “Bit” loaded: a scalable lightweight join query processor for RDF data. In: 19th International Conference on World Wide Web, pp. 41–50 (2010)

    Google Scholar 

  17. Neumann, T., Weikum, G.: RDF-3X: a RISC-style engine for RDF. PVLDB 1(1) (2008)

    Google Scholar 

  18. Sidirourgos, L., Goncalves, R., Kersten, M.L., Nes, N., Manegold, S.: Column-store support for RDF data management: not all swans are white. PVLDB 1(2), 1553–1563 (2008)

    Google Scholar 

  19. Theodoridis, S., Koutroumbas, K.: Pattern Recognition, 3rd edn. Academic Press, Inc., Orlando (2006)

    MATH  Google Scholar 

  20. Weiss, C., Karras, P., Bernstein, A.: Hexastore: sextuple indexing for semantic web data management. PVLDB 1(1), 1008–1019 (2008)

    Google Scholar 

  21. Wilkinson, K., Sayers, C., Kuno, H.A., Reynolds, D.: Efficient RDF Storage and Retrieval in Jena2. In: SWDB, pp. 131–150 (2003)

    Google Scholar 

  22. Zhang, Z., Dai, G., Jordan, M.I.: Matrix-Variate Dirichlet Process Mixture Models. Journal of Machine Learning Research - Proceedings Track 9, 980–987 (2010)

    Google Scholar 

  23. Stein, R., Zacharias, V.: RDF On Cloud Number Nine. In: Proceedings of the 4th Workshop on New Forms of Reasoning for the Semantic Web: Scalable & Dynamic, CEUR Workshop Proceedings, pp. 11–23 (May 2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Du, F., Chen, Y., Du, X. (2012). Partitioned Indexes for Entity Search over RDF Knowledge Bases. In: Lee, Sg., Peng, Z., Zhou, X., Moon, YS., Unland, R., Yoo, J. (eds) Database Systems for Advanced Applications. DASFAA 2012. Lecture Notes in Computer Science, vol 7238. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29038-1_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-29038-1_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-29037-4

  • Online ISBN: 978-3-642-29038-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics