Partitioned Indexes for Entity Search over RDF Knowledge Bases

  • Fang Du
  • Yueguo Chen
  • Xiaoyong Du
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7238)

Abstract

The rapid growth of RDF data in RDF knowledge bases calls for efficient query processing techniques. This paper focuses on the star-style SPARQL join queries, which is very common when users want to search information of entities from RDF knowledge bases. We observe that the computational cost of such queries mainly comes from loading a large portion of predicate-ahead indexes. We therefore propose to partition the whole RDF knowledge bases based on the schema of individual entities, so that only entities of similar schemas are allocated into the same cluster. Such a partitioning strategy generates a pruning mechanism that effectively isolate the correlations of partitions and the queries. Consequently, queries are only conducted over a small number of partitions with small predicate-ahead indexes. Experiments over a large real-life RDF data set show the significant performance improvements achieved by our partitioned indexing techniques.

Keywords

Entity search SPARQL query index clustering 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Billion Triple Challenge, http://challenge.semanticweb.org/
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
  7. 7.
    Abadi, D.J., Marcus, A., Madden, S., Hollenbach, K.J.: Scalable Semantic Web Data Management Using Vertical Partitioning. In: 33th International Conference on Very Large Data Bases, pp. 411–422 (2007)Google Scholar
  8. 8.
    Bizer, C., Heath, T., Idehen, K., Berners-Lee, T.: Linked data on the web (LDOW 2008). In: 17th International Conference on World Wide Web, pp. 1265–1266 (2008)Google Scholar
  9. 9.
    Bröcheler, M., Pugliese, A., Subrahmanian, V.S.: DOGMA: A Disk-Oriented Graph Matching Algorithm for RDF Databases. In: Bernstein, A., Karger, D.R., Heath, T., Feigenbaum, L., Maynard, D., Motta, E., Thirunarayan, K. (eds.) ISWC 2009. LNCS, vol. 5823, pp. 97–113. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  10. 10.
    Neumann, T., Weikum, G.: Scalable join processing on very large RDF graphs. In: 2008 ACM SIGMOD International Conference on Management of Data, pp. 627–640 (2009)Google Scholar
  11. 11.
    Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: a core of semantic knowledge Unifying WordNet and Wikipedia. In: 16th International Conference on World Wide Web, pp. 697–706 (2007)Google Scholar
  12. 12.
    Zemánek, J., Schenk, S.: Optimizing SPARQL Queries over Disparate RDF Data Sources through Distributed Semi-Joins. In: International Semantic Web Conference (Posters & Demos) (2008)Google Scholar
  13. 13.
    Harth, A., Decker, S.: Optimized Index Structures for Querying RDF from the Web. In: LA-WEB, pp. 71–80 (2005)Google Scholar
  14. 14.
    Broekstra, J., Kampman, A., van Harmelen, F.: Sesame: An architecture for storing and querying rdf data and schema information. Web Sem. 8(4), 271–277 (2010)CrossRefGoogle Scholar
  15. 15.
    Carroll, J.J., Dickinson, I., Dollin, C., Reynolds, D., Seaborne, A., Wilkinson, K.: Jena: Implementing the Semantic Web Recommendations. In: WWW Alt, pp. 74–83 (2004)Google Scholar
  16. 16.
    Atre, M., Chaoji, V., Zaki, M.J., Hendler, J.A.: Matrix “Bit” loaded: a scalable lightweight join query processor for RDF data. In: 19th International Conference on World Wide Web, pp. 41–50 (2010)Google Scholar
  17. 17.
    Neumann, T., Weikum, G.: RDF-3X: a RISC-style engine for RDF. PVLDB 1(1) (2008)Google Scholar
  18. 18.
    Sidirourgos, L., Goncalves, R., Kersten, M.L., Nes, N., Manegold, S.: Column-store support for RDF data management: not all swans are white. PVLDB 1(2), 1553–1563 (2008)Google Scholar
  19. 19.
    Theodoridis, S., Koutroumbas, K.: Pattern Recognition, 3rd edn. Academic Press, Inc., Orlando (2006)MATHGoogle Scholar
  20. 20.
    Weiss, C., Karras, P., Bernstein, A.: Hexastore: sextuple indexing for semantic web data management. PVLDB 1(1), 1008–1019 (2008)Google Scholar
  21. 21.
    Wilkinson, K., Sayers, C., Kuno, H.A., Reynolds, D.: Efficient RDF Storage and Retrieval in Jena2. In: SWDB, pp. 131–150 (2003)Google Scholar
  22. 22.
    Zhang, Z., Dai, G., Jordan, M.I.: Matrix-Variate Dirichlet Process Mixture Models. Journal of Machine Learning Research - Proceedings Track 9, 980–987 (2010)Google Scholar
  23. 23.
    Stein, R., Zacharias, V.: RDF On Cloud Number Nine. In: Proceedings of the 4th Workshop on New Forms of Reasoning for the Semantic Web: Scalable & Dynamic, CEUR Workshop Proceedings, pp. 11–23 (May 2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Fang Du
    • 1
  • Yueguo Chen
    • 2
  • Xiaoyong Du
    • 1
    • 2
  1. 1.School of InformationRenmin University of ChinaBeijingChina
  2. 2.Key Laboratory of Data Engineering and Knowledge Engineering(Renmin University of China), MOEChina

Personalised recommendations