Abstract
More and more applications use the RDF framework as their data model and RDF stores to index and retrieve their data. Many of these applications require both structured queries as well as fulltext search. SPARQL addresses the first requirement in a standardized way, while fulltext search is provided by store-specific implementations. RDF benchmarks enable developers to compare structured query performance of different stores, but for fulltext search on RDF data no such benchmarks and comparisons exist so far. In this paper, we extend the LUBM benchmark with synthetic scalable fulltext data and corresponding queries for fulltext-related query performance evaluation. Based on the extended benchmark, we provide a detailed comparison of fulltext search features and performance of the most widely used RDF stores. Results show interesting RDF store insights for basic fulltext queries (classic IR queries) as well as hybrid queries (structured and fulltext queries). Our results are not only valuable for selecting the right RDF store for specific applications, but also reveal the need for performance improvements for certain kinds of queries.
Chapter PDF
Similar content being viewed by others
References
Berners-Lee, T., et al.: The Semantic Web. Scientific American 279(5) (May 2001)
Hildebrand, M., et al.: An analysis of search-based user interaction on the Semantic Web. Report, Centrum Wiskunde & Informatica (2007) INS-E0706, ISSN 1386-3681
Prud’hommeaux, E., Seaborne, A.: SPARQL Query Language for RDF (January 2008)
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. ACM Press / Addison-Wesley (1999)
Garcia-Molina, H., et al.: Database Systems: The Complete Book. Prentice Hall, Englewood Cliffs (2008)
Chaudhuri, S., et al.: Integrating DB and IR Technologies: What is the Sound of One Hand Clapping? In: CIDR, pp. 1–12 (2005)
Amer-Yahia, S., et al.: Report on the db/ir panel at sigmod 2005. SIGMOD Rec. 34(4), 71–74 (2005)
Hristidis, V., et al.: Efficient IR-Style Keyword Search over Relational Databases. In: VLDB, pp. 850–861 (2003)
DeFazio, S., et al.: Integrating IR and RDBMS Using Cooperative Indexing. In: SIGIR, Seattle, Washington, USA, July 9–13, 1995, pp. 84–92. ACM Press, New York (1995)
Consens, M.P., et al.: XML Retrieval: DB/IR in Theory, Web in Practice. In: VLDB, University of Vienna, Austria, September 23–27, 2007, pp. 1437–1438. ACM, New York (2007)
Bhagdev, R., et al.: Hybrid Search: Effectively Combining Keywords and Semantic Searches. In: ESWC, Tenerife, Canary Islands, Spain, June 1-5, 2008, pp. 554–568 (2008)
Zhang, L., et al.: Semplore: An IR Approach to Scalable Hybrid Query of Semantic Web Data. In: Aberer, K., Choi, K.-S., Noy, N., Allemang, D., Lee, K.-I., Nixon, L., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux, P. (eds.) ASWC 2007 and ISWC 2007. LNCS, vol. 4825. Springer, Heidelberg (2007)
Baeza-Yates, R.A., Gonnet, G.H.: Fast Text Searching for Regular Expressions or Automaton Searching on Tries. Journal of the ACM 43(6), 915–936 (1996)
Cho, J.: A fast regular expression indexing engine. In: Proceedings of the 18th International Conference on Data Engineering (2002)
Harth, A., et al.: YARS2: A Federated Repository for Querying Graph Structured Data from the Web. In: Aberer, K., Choi, K.-S., Noy, N., Allemang, D., Lee, K.-I., Nixon, L., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux, P. (eds.) ASWC 2007 and ISWC 2007. LNCS, vol. 4825, pp. 211–224. Springer, Heidelberg (2007)
Guo, Y., Pan, Z., Heflin, J.: LUBM: A Benchmark for OWL Knowledge Base Systems. Journal of Web Semantics 3(2), 158–182 (2005)
Bizer, C., Schultz, A.: Benchmarking the Performance of Storage Systems that expose SPARQL Endpoints. In: Proceedings of the 4th International Workshop on Scalable Semantic Web knowledge Base Systems (SSWS 2008) (2008)
Schmidt, M., et al.: An Experimental Comparison of RDF Data Management Approaches in a SPARQL Benchmark Scenario. In: ISWC, October 26–30 (2008)
Minack, E., et al.: The Sesame LuceneSail: RDF Queries with Full-text Search. Technical Report 2008-1, NEPOMUK (February 2008)
Harth, A., Decker, S.: Optimized Index Structures for Querying RDF from the Web. In: Proceedings of the 3rd Latin American Web Congress. IEEE Press, Los Alamitos (2005)
Steyvers, M., Griffiths, T.: Probabilistic Topic Models. Lawrence Erlbaum, Mahwah (2006)
Carroll, J.J., et al.: Jena: Implementing the Semantic Web Recommendations. In: WWW Alternate track papers & posters, pp. 74–83. ACM, New York (2004)
Broekstra, J., et al.: Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema. In: Horrocks, I., Hendler, J. (eds.) ISWC 2002. LNCS, vol. 2342, pp. 54–68. Springer, Heidelberg (2002)
Gray, J. (ed.): The Benchmark Handbook For Database and Transaction Processing Systems. Morgan Kaufmann, San Francisco (1993)
Ercegovac, V., et al.: The TEXTURE Benchmark: Measuring Performance of Text Queries on a Relational DBMS. In: VLDB, Trondheim, Norway, pp. 313–324 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Minack, E., Siberski, W., Nejdl, W. (2009). Benchmarking Fulltext Search Performance of RDF Stores. In: Aroyo, L., et al. The Semantic Web: Research and Applications. ESWC 2009. Lecture Notes in Computer Science, vol 5554. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02121-3_10
Download citation
DOI: https://doi.org/10.1007/978-3-642-02121-3_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02120-6
Online ISBN: 978-3-642-02121-3
eBook Packages: Computer ScienceComputer Science (R0)