A Semantic Web Middleware for Virtual Data Integration on the Web

  • Andreas Langegger
  • Wolfram Wöß
  • Martin Blöchl
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5021)


In this contribution a system is presented, which provides access to distributed data sources using Semantic Web technology. While it was primarily designed for data sharing and scientific collaboration, it is regarded as a base technology useful for many other Semantic Web applications. The proposed system allows to retrieve data using SPARQL queries, data sources can register and abandon freely, and all RDF Schema or OWL vocabularies can be used to describe their data, as long as they are accessible on the Web. Data heterogeneity is addressed by RDF-wrappers like D2R-Server placed on top of local information systems. A query does not directly refer to actual endpoints, instead it contains graph patterns adhering to a virtual data set. A mediator finally pulls and joins RDF data from different endpoints providing a transparent on-the-fly view to the end-user.

The SPARQL protocol has been defined to enable systematic data access to remote endpoints. However, remote SPARQL queries require the explicit notion of endpoint URIs. The presented system allows users to execute queries without the need to specify target endpoints. Additionally, it is possible to execute join and union operations across different remote endpoints. The optimization of such distributed operations is a key factor concerning the performance of the overall system. Therefore, proven concepts from database research can be applied.


  1. 1.
    The Billion Triples Challenge (mailing list archive at Yahoo!) (2007) (last visit December 12, 2007), http://tech.groups.yahoo.com/group/billiontriples/
  2. 2.
    Auer, S., Bizer, C., Lehmann, J., Kobilarov, G., Cyganiak, R., Ives, Z.: DBpedia: A nucleus for a web of open data. In: Aberer, K., Choi, K.-S., Noy, N., Allemang, D., Lee, K.-I., Nixon, L., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux, P. (eds.) ISWC 2007. LNCS, vol. 4825, pp. 715–728. Springer, Heidelberg (2007)Google Scholar
  3. 3.
    Quilitz, B.: DARQ – Federated Queries with SPARQL (2006) (last visit December 12, 2007), http://darq.sourceforge.net/
  4. 4.
    Batini, C., Lenzerini, M., Navathe, S.B.: A comparative analysis of methodologies for database schema integration. ACM Comput. Surv. 18(4), 323–364 (1986)CrossRefGoogle Scholar
  5. 5.
    Berners-Lee, T., Chen, Y., Chilton, L., Connolly, D., et al.: Tabulator: Exploring and analyzing linked data on the semantic web. In: Proceedings of the ISWC Workshop on Semantic Web User Interaction (2006)Google Scholar
  6. 6.
    Bizer, C., Cyganiak, R.: D2RQ – lessons learned. In: The W3C Workshop on RDF Access to Relational Databases (October 2007), http://www.w3.org/2007/03/RdfRDB/papers/d2rq-positionpaper/
  7. 7.
    Bizer, C., Cyganiak, R.: D2R Server – Publishing Relational Databases on the Semantic Web. In: 5th International Semantic Web Conference (2006)Google Scholar
  8. 8.
    In Silico Discovery. Semantic discovery system (2007) (last visit December 12, 2007), http://www.insilicodiscovery.com
  9. 9.
    Kossmann, D.: The State of the Art in Distributed Query Processing. ACM Comput. Surv. 32(4), 422–469 (2000)CrossRefGoogle Scholar
  10. 10.
    Kossmann, D., Stocker, K.: Iterative dynamic programming: a new class of query optimization algorithms. ACM Trans. Database Syst. 25(1), 43–82 (2000)CrossRefGoogle Scholar
  11. 11.
    Foster, I., Kesselman, C., Tuecke, S.: The Anatomy of the Grid: Enabling Scalable Virtual Organizations. In: Sakellariou, R., Keane, J.A., Gurd, J.R., Freeman, L. (eds.) Euro-Par 2001. LNCS, vol. 2150, Springer, Heidelberg (2001)CrossRefGoogle Scholar
  12. 12.
    Haslhofer, B.: Mediaspaces (2007), http://www.mediaspaces.info/
  13. 13.
    He, B., Patel, M., Zhang, Z., Chang, K.C.-C.: Accessing the deep web. Commun. ACM 50(5), 94–101 (2007)CrossRefGoogle Scholar
  14. 14.
    UK HP Labs, Bristol. Jena – A Semantic Web Framework for Java (last visit March 2007), http://jena.sourceforge.net/
  15. 15.
    Langegger, A., Blöchl, M., Wöß, W.: Sharing data on the grid using ontologies and distributed SPARQL queries. In: Wagner, R., Revell, N., Pernul, G. (eds.) DEXA 2007. LNCS, vol. 4653, pp. 450–454. Springer, Heidelberg (2007)Google Scholar
  16. 16.
    Langegger, A., Wöß, W., Blöchl, M.: Semantic data access middleware for grids (last visit December 2007), http://gsdam.sourceforge.net
  17. 17.
    Haas, L.M., Kossmann, D., Wimmers, E.L., Yang, J.: Optimizing Queries Across Diverse Data Sources. In: Proceedings of the 23th International Conference on Very Large Databases, Athens, VLDB Endowment, Saratoga, Calif, pp. 276–285 (1997)Google Scholar
  18. 18.
    Melnik, S.: Generic Model Management: Concepts And Algorithms. LNCS. Springer, New York (2004)MATHGoogle Scholar
  19. 19.
    Miles, A., Baker, T., Swick, R.: Best practice recipes for publishing RDF vocabularies (2006) (last visit December 12, 2007), http://www.w3.org/TR/swbp-vocab-pub/
  20. 20.
    Noy, N.F., Rubin, D.L., Musen, M.A.: Making biomedical ontologies and ontology repositories work. Intelligent Systems 19(6), 78–81 (2004)CrossRefGoogle Scholar
  21. 21.
    OpenLink Software. OpenLink Virtuoso (last visit March 2007) http://www.openlinksw.com/virtuoso/
  22. 22.
    Prud’hommeaux, E.: Optimal RDF access to relational databases (April 2004), http://www.w3.org/2004/04/30-RDF-RDB-access/
  23. 23.
    Prud’hommeaux, E.: Federated SPARQL (May 2007), http://www.w3.org/2007/05/SPARQLfed/
  24. 24.
    Sattler, K.-U., Geist, I., Schallehn, E.: Concept-based querying in mediator systems. The VLDB Journal 14(1), 97–111 (2005)CrossRefGoogle Scholar
  25. 25.
    Tan, H., Lambrix, P.: A method for recommending ontology alignment strategies. In: Aberer, K., Choi, K.-S., Noy, N., Allemang, D., Lee, K.-I., Nixon, L., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux, P. (eds.) ISWC 2007. LNCS, vol. 4825, pp. 491–504. Springer, Heidelberg (2007)Google Scholar
  26. 26.
    Tomasic, A., Raschid, L., Valduriez, P.: Scaling heterogeneous databases and the design of disco. ICDCS 00, 449 (1996)Google Scholar
  27. 27.
    Tummarello, G., Delbru, R., Oren, E.: Sindice.com: Weaving the open linked data. In: Proceedings of the 6th International Semantic Web Conference (ISWC) (November 2007)Google Scholar
  28. 28.
    W3C. SPARQL Query Language for RDF, W3C Proposed Recommendation (last visit May 2007), http://www.w3.org/TR/rdf-sparql-query/
  29. 29.
    Wiederhold, G.: Mediators in the architecture of future information systems. Computer 25(3), 38–49 (1992)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Andreas Langegger
    • 1
  • Wolfram Wöß
    • 1
  • Martin Blöchl
    • 1
  1. 1.Institute of Applied Knowledge ProcessingJohannes Kepler University LinzLinzAustria

Personalised recommendations