Ontology-Based Integration of Cross-Linked Datasets

  • Diego Calvanese
  • Martin Giese
  • Dag Hovland
  • Martin RezkEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9366)


In this paper we tackle the problem of answering SPARQL queries over virtually integrated databases. We assume that the entity resolution problem has already been solved and explicit information is available about which records in the different databases refer to the same real world entity. Surprisingly, to the best of our knowledge, there has been no attempt to extend the standard Ontology-Based Data Access (OBDA) setting to take into account these DB links for SPARQL query-answering and consistency checking. This is partly because the OWL built-in owl:sameAs property, the most natural representation of links between data sets, is not included in OWL 2 QL, the de facto ontology language for OBDA. We formally treat several fundamental questions in this context: how links over database identifiers can be represented in terms of owl:sameAs statements, how to recover rewritability of SPARQL into SQL (lost because of owl:sameAs statements), and how to check consistency. Moreover, we investigate how our solution can be made to scale up to large enterprise datasets. We have implemented the approach, and carried out an extensive set of experiments showing its scalability.


SPARQL Query Triple Pattern Entity Resolution Query Answering Query Execution Time 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Artale, A., Calvanese, D., Kontchakov, R., Zakharyaschev, M.: The DL-Lite family and relations. J. of Artificial Intelligence Research 36, 1–69 (2009)MathSciNetzbMATHGoogle Scholar
  2. 2.
    Calvanese, D., De Giacomo, G., Lembo, D., Lenzerini, M., Rosati, R.: Tractable reasoning and efficient query answering in description logics: The DL-Lite family. J. Autom. Reasoning 39(3), 385–429 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
  3. 3.
    Calvanese, D., Giese, M., Hovland, D., Rezk, M.: Ontology-based integration of cross-linked datasets (2015). (accessed April 30, 2015)
  4. 4.
    Das, S., Sundara, S., Cyganiak, R.: R2RML: RDB to RDF mapping language. W3C Recommendation, W3C (September 2012).
  5. 5.
    DeWitt, D.J.: The wisconsin benchmark: past, present, and future. In: Gray, J. (ed.) The Benchmark Handbook. Morgan Kaufmann (1992)Google Scholar
  6. 6.
    Doan, A., Halevy, A.Y., Ives, Z.G.: Principles of Data Integration. Morgan Kaufmann (2012)Google Scholar
  7. 7.
    Ioannou, E., Nejdl, W., Niederée, C., Velegrakis, Y.: On-the-fly entity-aware query processing in the presence of linkage. PVLDB 3(1), 429–438 (2010)Google Scholar
  8. 8.
    Kontchakov, R., Lutz, C., Toman, D., Wolter, F., Zakharyaschev, M.: The combined approach to ontology-based data access. In: Proc. of IJCAI 2011, pp. 2656–2661 (2011)Google Scholar
  9. 9.
    Kontchakov, R., Rezk, M., Rodríguez-Muro, M., Xiao, G., Zakharyaschev, M.: Answering SPARQL queries over databases under OWL 2 QL entailment regime. In: Mika, P., et al. (eds.) ISWC 2014, Part I. LNCS, vol. 8796, pp. 552–567. Springer, Heidelberg (2014) Google Scholar
  10. 10.
    Lloyd, J.W.: Foundations of Logic Programming, 2nd edn. Springer-Verlag New York Inc, Secaucus (1993)zbMATHGoogle Scholar
  11. 11.
    Marnette, B.: Generalized schema-mappings: from termination to tractability. In: PODS 2009, pp. 13–22. ACM, New York (2009)Google Scholar
  12. 12.
    Motik, B., Cuenca Grau, B., Horrocks, I., Wu, Z., Fokoue, A., Lutz, C.: OWL 2 Web Ontology Language profiles, 2nd edn. W3C Recommendation, W3C (December 2012).
  13. 13.
    Motik, B., Nenov, Y., Piro, R.E.F., Horrocks, I.: Handling owl:sameAs via rewriting. In: Bonet, B., Koenig, S. (eds) Proc. 29th AAAI, pp. 231–237. AAAI Press (2015)Google Scholar
  14. 14.
    Rodríguez-Muro, M., Kontchakov, R., Zakharyaschev, M.: Ontology-based data access: Ontop of databases. In: Alani, H., et al. (eds.) ISWC 2013, Part I. LNCS, vol. 8218, pp. 558–573. Springer, Heidelberg (2013) CrossRefGoogle Scholar
  15. 15.
    Rodriguez-Muro, M., Rezk, M.: Efficient SPARQL-to-SQL with R2RML mappings. J. of Web Semantics 33, 141–169 (2015)Google Scholar
  16. 16.
    Schlegel, K., Stegmaier, F., Bayerl, S., Granitzer, M., Kosch, H.: Balloon fusion: SPARQL rewriting based on unified co-reference information. In: Proc. of the 30th Int. Conf. on Data Engineering Workshops (ICDE 2014), pp. 254–259. IEEE (2014)Google Scholar
  17. 17.
    Tsangaris, M.M., Kakaletris, G., Kllapi, H., Papanikos, G., Pentaris, F., Polydoras, P., Sitaridi, E., Stoumpos, V., Ioannidis, Y.E.: Dataflow processing and optimization on grid and cloud infrastructures. IEEE Bull. on Data Engineering 32(1), 67–74 (2009)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Diego Calvanese
    • 1
  • Martin Giese
    • 2
  • Dag Hovland
    • 2
  • Martin Rezk
    • 1
    Email author
  1. 1.Free University of Bozen-BolzanoBolzanoItaly
  2. 2.University of OsloOsloNorway

Personalised recommendations