Abstract
We present the architecture of an end-to-end semantic search engine that uses a graph data model to enable interactive query answering over structured and interlinked data collected from many disparate sources on the Web. In particular, we study distributed indexing methods for graph-structured data and parallel query evaluation methods on a cluster of computers. We evaluate the system on a dataset with 430 million statements collected from the Web, and provide scale-up experiments on 7 billion synthetically generated statements.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Angles, R., Gutiérrez, C.: Querying rdf data from a graph database perspective. In: Proceedings of the Second European Semantic Web Conference, pp. 346–360 (2005)
Battré, D., Heine, F., Höing, A., Kao, O.: Load-balancing in p2p based rdf stores. In: 2nd Workshop on Scalable Semantic Web Knowledge Base System (2006)
Bayer, R., McCreight, E.M.: Organization and maintenance of large ordered indices. Acta Informatica 1, 173–189 (1972)
Bernstein, P.A., Goodman, N.: Power of natural semijoins. SIAM Journal on Computing 10(4), 751–771 (1981)
Boncz, P.A., Zukowski, M., Nes, N.: MonetDB/X100: Hyper-Pipelining Query Execution. In: Proceedings of the Biennial Conference on Innovative Data Systems Research, pp. 225–237 (2005)
Brewer, E.A.: Combining Systems and Databases: A Search Engine Retrospective. In: Readings in Database Systems, 4th edn. (1998)
Broekstra, J., Kampman, A., van Harmelen, F.: Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema. In: Proceedings of the 2nd International Semantic Web Conference, pp. 54–68. Springer, Heidelberg (2002)
Cai, M., Frank, M.: Rdfpeers: a scalable distributed rdf repository based on a structured peer-to-peer network. In: Proceedings of the 13th International World Wide Web Conference, New York, NY, USA, pp. 650–657. ACM Press, New York (2004)
Ding, L., Finin, T., Joshi, A., Pan, R., Cost, R.S., Peng, Y., Reddivari, P., Doshi, V.C, Sachs, J.: Swoogle: A Search and Metadata Engine for the Semantic Web. In: Proceedings of the Thirteenth ACM Conference on Information and Knowledge Management, ACM Press, New York (2004)
Fensel, D., van Harmelen, F.: Unifying reasoning and search to web scale. IEEE Internet Computing 11(2), 94–95 (2007)
Garcia-Molina, H., Widom, J., Ullman, J.D.: Database System Implementation. Prentice-Hall, Inc., Upper Saddle River, NJ, USA (1999)
Guo, Y., Pan, Z., Heflin, J.: An Evaluation of Knowledge Base Systems for Large OWL Datasets. In: Proceedings of the 3rd International Semantic Web Conference, Hiroshima, pp. 274–288. Springer, Heidelberg (2004)
Harth, A., Decker, S.: Optimized index structures for querying rdf from the web. In: Proceedings of the 3rd Latin American Web Congress, Buenos Aires, Argentina, pp. 71–80. IEEE Computer Society Press, Los Alamitos (2005)
Harth, A., Umbrich, J., Decker, S.: Multicrawler: A pipelined architecture for crawling and indexing semantic web data. In: Proceedings of the 5th International Semantic Web Conference, pp. 258–271 (2006)
Hirai, J., Raghavan, S., Garcia-Molina, H., Paepcke, A.: WebBase: a repository of Web pages. Computer Networks 33(1–6), 277–293 (2000)
Hogan, A., Harth, A., Decker, S.: ReConRank: A Scalable Ranking Method for Semantic Web Data with Context. In: 2nd Workshop on Scalable Semantic Web Knowledge Base Systems (2006)
Hogan, A., Harth, A., Decker, S.: Performing object consolidation on the semantic web data graph. In: Proceedings of 1st I3: Identity, Identifiers, Identification Workshop (2007)
Kossmann, D.: The state of the art in distributed query processing. ACM Computing Surveys 32(4), 422–469 (2000)
Lum, V.Y.: Multi-attribute retrieval with combined indexes. Communications of the ACM 13(11), 660–665 (1970)
Salton, G., McGill, M.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1984)
Selinger, P.G., Astrahan, M.M., Chamberlin, D.D., Lorie, R.A., Price, T.G.: Access path selection in a relational database management system. In: Proceedings of the 1979 International Conference on Management of Data, Boston, Massachusetts, pp. 23–34 (1979)
Stuckenschmidt, H., Vdovjak, R., Houben, G.-J., Broekstra, J.: Index Structures and Algorithms for Querying Distributed RDF Repositories. In: Proceedings of the 13th International World Wide Web Conference, pp. 631–639 (2004)
Witten, I.H., Moffat, A., Bell, T.C.: Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kaufmann, San Francisco (1999)
Wood, D., Gearon, P., Adams, T.: Kowari: A platform for semantic web storage and analysis. In: XTech 2005 Conference (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Harth, A., Umbrich, J., Hogan, A., Decker, S. (2007). YARS2: A Federated Repository for Querying Graph Structured Data from the Web. In: Aberer, K., et al. The Semantic Web. ISWC ASWC 2007 2007. Lecture Notes in Computer Science, vol 4825. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-76298-0_16
Download citation
DOI: https://doi.org/10.1007/978-3-540-76298-0_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-76297-3
Online ISBN: 978-3-540-76298-0
eBook Packages: Computer ScienceComputer Science (R0)