YARS2: A Federated Repository for Querying Graph Structured Data from the Web

  • Andreas Harth
  • Jürgen Umbrich
  • Aidan Hogan
  • Stefan Decker
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4825)


We present the architecture of an end-to-end semantic search engine that uses a graph data model to enable interactive query answering over structured and interlinked data collected from many disparate sources on the Web. In particular, we study distributed indexing methods for graph-structured data and parallel query evaluation methods on a cluster of computers. We evaluate the system on a dataset with 430 million statements collected from the Web, and provide scale-up experiments on 7 billion synthetically generated statements.


  1. 1.
    Angles, R., Gutiérrez, C.: Querying rdf data from a graph database perspective. In: Proceedings of the Second European Semantic Web Conference, pp. 346–360 (2005)Google Scholar
  2. 2.
    Battré, D., Heine, F., Höing, A., Kao, O.: Load-balancing in p2p based rdf stores. In: 2nd Workshop on Scalable Semantic Web Knowledge Base System (2006)Google Scholar
  3. 3.
    Bayer, R., McCreight, E.M.: Organization and maintenance of large ordered indices. Acta Informatica 1, 173–189 (1972)CrossRefGoogle Scholar
  4. 4.
    Bernstein, P.A., Goodman, N.: Power of natural semijoins. SIAM Journal on Computing 10(4), 751–771 (1981)MATHCrossRefMathSciNetGoogle Scholar
  5. 5.
    Boncz, P.A., Zukowski, M., Nes, N.: MonetDB/X100: Hyper-Pipelining Query Execution. In: Proceedings of the Biennial Conference on Innovative Data Systems Research, pp. 225–237 (2005)Google Scholar
  6. 6.
    Brewer, E.A.: Combining Systems and Databases: A Search Engine Retrospective. In: Readings in Database Systems, 4th edn. (1998)Google Scholar
  7. 7.
    Broekstra, J., Kampman, A., van Harmelen, F.: Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema. In: Proceedings of the 2nd International Semantic Web Conference, pp. 54–68. Springer, Heidelberg (2002)Google Scholar
  8. 8.
    Cai, M., Frank, M.: Rdfpeers: a scalable distributed rdf repository based on a structured peer-to-peer network. In: Proceedings of the 13th International World Wide Web Conference, New York, NY, USA, pp. 650–657. ACM Press, New York (2004)CrossRefGoogle Scholar
  9. 9.
    Ding, L., Finin, T., Joshi, A., Pan, R., Cost, R.S., Peng, Y., Reddivari, P., Doshi, V.C, Sachs, J.: Swoogle: A Search and Metadata Engine for the Semantic Web. In: Proceedings of the Thirteenth ACM Conference on Information and Knowledge Management, ACM Press, New York (2004)Google Scholar
  10. 10.
    Fensel, D., van Harmelen, F.: Unifying reasoning and search to web scale. IEEE Internet Computing 11(2), 94–95 (2007)CrossRefGoogle Scholar
  11. 11.
    Garcia-Molina, H., Widom, J., Ullman, J.D.: Database System Implementation. Prentice-Hall, Inc., Upper Saddle River, NJ, USA (1999)Google Scholar
  12. 12.
    Guo, Y., Pan, Z., Heflin, J.: An Evaluation of Knowledge Base Systems for Large OWL Datasets. In: Proceedings of the 3rd International Semantic Web Conference, Hiroshima, pp. 274–288. Springer, Heidelberg (2004)Google Scholar
  13. 13.
    Harth, A., Decker, S.: Optimized index structures for querying rdf from the web. In: Proceedings of the 3rd Latin American Web Congress, Buenos Aires, Argentina, pp. 71–80. IEEE Computer Society Press, Los Alamitos (2005)CrossRefGoogle Scholar
  14. 14.
    Harth, A., Umbrich, J., Decker, S.: Multicrawler: A pipelined architecture for crawling and indexing semantic web data. In: Proceedings of the 5th International Semantic Web Conference, pp. 258–271 (2006)Google Scholar
  15. 15.
    Hirai, J., Raghavan, S., Garcia-Molina, H., Paepcke, A.: WebBase: a repository of Web pages. Computer Networks 33(1–6), 277–293 (2000)CrossRefGoogle Scholar
  16. 16.
    Hogan, A., Harth, A., Decker, S.: ReConRank: A Scalable Ranking Method for Semantic Web Data with Context. In: 2nd Workshop on Scalable Semantic Web Knowledge Base Systems (2006)Google Scholar
  17. 17.
    Hogan, A., Harth, A., Decker, S.: Performing object consolidation on the semantic web data graph. In: Proceedings of 1st I3: Identity, Identifiers, Identification Workshop (2007)Google Scholar
  18. 18.
    Kossmann, D.: The state of the art in distributed query processing. ACM Computing Surveys 32(4), 422–469 (2000)CrossRefGoogle Scholar
  19. 19.
    Lum, V.Y.: Multi-attribute retrieval with combined indexes. Communications of the ACM 13(11), 660–665 (1970)CrossRefGoogle Scholar
  20. 20.
    Salton, G., McGill, M.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1984)Google Scholar
  21. 21.
    Selinger, P.G., Astrahan, M.M., Chamberlin, D.D., Lorie, R.A., Price, T.G.: Access path selection in a relational database management system. In: Proceedings of the 1979 International Conference on Management of Data, Boston, Massachusetts, pp. 23–34 (1979)Google Scholar
  22. 22.
    Stuckenschmidt, H., Vdovjak, R., Houben, G.-J., Broekstra, J.: Index Structures and Algorithms for Querying Distributed RDF Repositories. In: Proceedings of the 13th International World Wide Web Conference, pp. 631–639 (2004)Google Scholar
  23. 23.
    Witten, I.H., Moffat, A., Bell, T.C.: Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kaufmann, San Francisco (1999)Google Scholar
  24. 24.
    Wood, D., Gearon, P., Adams, T.: Kowari: A platform for semantic web storage and analysis. In: XTech 2005 Conference (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Andreas Harth
    • 1
  • Jürgen Umbrich
    • 1
  • Aidan Hogan
    • 1
  • Stefan Decker
    • 1
  1. 1.National University of Ireland, Galway, Digital Enterprise Research Institute 

Personalised recommendations