Efficient Ontology-Based Data Integration with Canonical IRIs

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10843)


In this paper, we study how to efficiently integrate multiple relational databases using an ontology-based approach. In ontology-based data integration (OBDI) an ontology provides a coherent view of multiple databases, and SPARQL queries over the ontology are rewritten into (federated) SQL queries over the underlying databases. Specifically, we address the scenario where records with different identifiers in different databases can represent the same entity. The standard approach in this case is to use sameAs to model the equivalence between entities. However, the standard semantics of sameAs may cause an exponential blow up of query results, since all possible combinations of equivalent identifiers have to be included in the answers. The large number of answers is not only detrimental to the performance of query evaluation, but also makes the answers difficult to understand due to the redundancy they introduce. This motivates us to propose an alternative approach, which is based on assigning canonical IRIs to entities in order to avoid redundancy. Formally, we present our approach as a new SPARQL entailment regime and compare it with the sameAs approach. We provide a prototype implementation and evaluate it in two experiments: in a real-world data integration scenario in Statoil and in an experiment extending the Wisconsin benchmark. The experimental results show that the canonical IRI approach is significantly more scalable.



This research is supported by the project OBATS, funded by Free University of Bozen-Bolzano, by the Euregio IPN12 KAOS, funded by the “European Region Tyrol-South Tyrol-Trentino” (EGTC) under the first call for basic research projects, and by the Sirius Centre funded by the Norwegian Research Council.


  1. 1.
    Brüggemann, S., Bereta, K., Xiao, G., Koubarakis, M.: Ontology-based data access for maritime security. In: Sack, H., Blomqvist, E., d’Aquin, M., Ghidini, C., Ponzetto, S.P., Lange, C. (eds.) ESWC 2016. LNCS, vol. 9678, pp. 741–757. Springer, Cham (2016). Scholar
  2. 2.
    Calvanese, D., Cogrel, B., Komla-Ebri, S., Kontchakov, R., Lanti, D., Rezk, M., Rodriguez-Muro, M., Xiao, G.: Ontop: answering SPARQL queries over relational databases. Semant. Web J. 8(3), 471–487 (2017)CrossRefGoogle Scholar
  3. 3.
    Calvanese, D., De Giacomo, G., Lembo, D., Lenzerini, M., Poggi, A., Rosati, R., Ruzzi, M.: Data integration through \({{DL{-}Lite}_{\cal{A}}}\) ontologies. In: Schewe, K.-D., Thalheim, B. (eds.) SDKB 2008. LNCS, vol. 4925, pp. 26–47. Springer, Heidelberg (2008). Scholar
  4. 4.
    Calvanese, D., De Giacomo, G., Lembo, D., Lenzerini, M., Rosati, R.: Tractable reasoning and efficient query answering in description logics: the DL-Lite family. J. Autom. Reason. 39(3), 385–429 (2007)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Calvanese, D., Giese, M., Hovland, D., Rezk, M.: Ontology-based integration of cross-linked datasets. In: Arenas, M., Corcho, O., Simperl, E., Strohmaier, M., d’Aquin, M., Srinivas, K., Groth, P., Dumontier, M., Heflin, J., Thirunarayan, K., Staab, S. (eds.) ISWC 2015. LNCS, vol. 9366, pp. 199–216. Springer, Cham (2015). Scholar
  6. 6.
    Chawathe, S.S., Garcia-Molina, H., Hammer, J., Ireland, K., Papakonstantinou, Y., Ullman, J.D., Widom, J.: The TSIMMIS project: integration of heterogeneous information sources. In: Proceedings of the 10th Meeting of the International Proceedings Society of Japan (IPSJ 1994), pp. 7–18 (1994)Google Scholar
  7. 7.
    Chronis, Y., Foufoulas, Y., Nikolopoulos, V., Papadopoulos, A., Stamatogiannakis, L., Svingos, C., Ioannidis, Y.E.: A relational approach to complex dataflows. In: Proceedings of the EDBT/ICDT Workshops. CEUR, vol. 1558. (2016)Google Scholar
  8. 8.
    Das, S., Sundara, S., Cyganiak, R.: R2RML: RDB to RDF mapping language. W3C Recommendation, W3C, September 2012.
  9. 9.
    DeWitt, D.J.: The Wisconsin benchmark: past, present, and future. In: Gray, J. (ed.) The Benchmark Handbook. Morgan Kaufmann, Burlington (1992)Google Scholar
  10. 10.
    Duggan, J., Elmore, A.J., Stonebraker, M., Balazinska, M., Howe, B., Kepner, J., Madden, S., Maier, D., Mattson, T., Zdonik, S.: The BigDAWG polystore system. SIGMOD Rec. 44(2), 11–16 (2015)CrossRefGoogle Scholar
  11. 11.
    Giese, M., Soylu, A., Vega-Gorgojo, G., Waaler, A., Haase, P., Jiménez-Ruiz, E., Lanti, D., Rezk, M., Xiao, G., Özçep, Ö.L., Rosati, R.: Optique: zooming in on big data. IEEE Comput. 48(3), 60–67 (2015)CrossRefGoogle Scholar
  12. 12.
    Glimm, B., Ogbuji, C.: SPARQL 1.1 entailment regimes. W3C Recommendation, W3C, March 2013.
  13. 13.
    Haas, L.M., Kossmann, D., Wimmers, E.L., Yang, J.: Optimizing queries across diverse data sources. In: Proceedings of the VLDB 1997, pp. 276–285 (1997)Google Scholar
  14. 14.
    Harris, S., Seaborne, A.: SPARQL 1.1 query language. W3C Recommendation, W3C, March 2013.
  15. 15.
    Hovland, D., Kontchakov, R., Skjæveland, M.G., Waaler, A., Zakharyaschev, M.: Ontology-based data access to slegge. In: d’Amato, C., Fernandez, M., Tamma, V., Lecue, F., Cudré-Mauroux, P., Sequeda, J., Lange, C., Heflin, J. (eds.) ISWC 2017. LNCS, vol. 10588, pp. 120–129. Springer, Cham (2017). Scholar
  16. 16.
    Ives, Z.G., Florescu, D., Friedman, M., Levy, A., Weld, D.S.: An adaptive query execution system for data integration. SIGMOD Rec. 28(2), 299–310 (1999)CrossRefGoogle Scholar
  17. 17.
    Kaminski, M., Kostylev, E.V., Cuenca Grau, B.: Query nesting, assignment, and aggregation in SPARQL 1.1. ACM Trans. Database Syst. 42(3), 17:1–17:46 (2017)MathSciNetCrossRefGoogle Scholar
  18. 18.
    Kharlamov, E., Hovland, D., Skjæveland, M.G., Bilidas, D., Jiménez-Ruiz, E., Xiao, G., Soylu, A., Lanti, D., Rezk, M., Zheleznyakov, D., Giese, M., Lie, H., Ioannidis, Y.E., Kotidis, Y., Koubarakis, M., Waaler, A.: Ontology based data access in Statoil. J. Web Semant. 44, 3–36 (2017)CrossRefGoogle Scholar
  19. 19.
    Kllapi, H., Sitaridi, E., Tsangaris, M.M., Ioannidis, Y.: Schedule optimization for data processing flows on the cloud. In: Proceeding ACM SIGMOD 2011, pp. 289–300 (2011)Google Scholar
  20. 20.
    Kontchakov, R., Rezk, M., Rodríguez-Muro, M., Xiao, G., Zakharyaschev, M.: Answering SPARQL queries over databases under OWL 2 QL entailment regime. In: Mika, P., et al. (eds.) ISWC 2014. LNCS, vol. 8796, pp. 552–567. Springer, Cham (2014). Scholar
  21. 21.
    Motik, B., Cuenca Grau, B., Horrocks, I., Wu, Z., Fokoue, A., Lutz, C.: OWL 2 Web Ontology Language Profiles, 2nd edn. W3C Recommendation, W3C, December 2012Google Scholar
  22. 22.
    Motik, B., Nenov, Y., Piro, R.E.F., Horrocks, I.: Handling owl:sameAs via rewriting. In: Proceedings of AAAI 2015, pp. 231–237. AAAI Press (2015)Google Scholar
  23. 23.
    Pérez, J., Arenas, M., Gutierrez, C.: Semantics and complexity of SPARQL. ACM Trans. Database Syst. 34(3), 16:1–16:45 (2009)CrossRefGoogle Scholar
  24. 24.
    Poggi, A., Lembo, D., Calvanese, D., De Giacomo, G., Lenzerini, M., Rosati, R.: Linking data to ontologies. J. Data Semant. 10, 133–173 (2008)zbMATHGoogle Scholar
  25. 25.
    Rodriguez-Muro, M., Rezk, M.: Efficient SPARQL-to-SQL with R2RML mappings. J. Web Semant. 33, 141–169 (2015)CrossRefGoogle Scholar
  26. 26.
    Sequeda, J.F., Arenas, M., Miranker, D.P.: OBDA: query rewriting or materialization? In practice, both!. In: Mika, P., et al. (eds.) ISWC 2014. LNCS, vol. 8796, pp. 535–551. Springer, Cham (2014). Scholar
  27. 27.
    W3C OWL Working Group. OWL 2 Web Ontology Language document overview, 2nd edn. W3C Recommendation, W3C, December 2012Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Faculty of Computer ScienceFree-University of Bozen-BolzanoBolzanoItaly
  2. 2.Department of InformaticsUniversity of OsloOsloNorway
  3. 3.National and Kapodistrian University of AthensAthensGreece
  4. 4.RakutenTokyoJapan

Personalised recommendations