Skip to main content
Log in

Semantics-aware data integration for heterogeneous data sources

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Abstract

This article presents a novel definition of a declarative mapping language, which is able to map precisely and unambiguously the semantics of a domain conceptualization (defined as an ontology) into queries to a set of data sources, where the data is residing. In this way, a system making use of this mapping language is able to access the data actually stored in the data sources thought a semantically rich representation. The mapping model proposed in this paper is also an ontology and therefore is machine understandable: it can be shared with other users or systems, processed by external tools for consistency checking, or collaboratively created and so on. Besides the contributions of the mapping model itself, this paper introduces the concepts of Semantic Join and Semantic Identifiers: a declarative approach to semantic data fusion and entity resolution over multiple unrelated databases, which allow to define extremely expressive mapping.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. Since ontologies also contain the concept of relation, we use the term DB relation to refer to the ones in data bases and the term relation to refer to the ontology relation.

  2. E.g.: a reasoning service on T-Box is subsumption, typically written as \(C \sqsubseteq D. \) Determining subsumption is the problem of checking whether the concept denoted by D (the subsumer) is considered more general than the one denoted by C (the subsumee). In other words, subsumption checks whether the first concept always denotes a subset of the set denoted by the second one. For example, one might be interested in knowing whether \(Router \sqsubseteq Network Devices.\)

  3. http://www.w3.org/TR/xquery/.

  4. http://www.w3.org/TR/xslt/.

References

  • Aparício AS, Farias OLM, dos Santos N (2005) Applying ontologies in the integration of heterogeneous relational databases. In: AOW ’05: Proceedings of the 2005 Australasian ontology workshop, Darlinghurst, Australia, Australia. Australian Computer Society, Inc., pp 11–16

  • Baader F, Calvanese D, McGuinness DL, Nardi D, Patel-Schneider PF (2003) The description logic handbook: theory, implementation, and applications. Cambridge University Press, Cambridge.

  • Batini C, Lenzerini M, Navathe SB (1986) A comparative analysis of methodologies for database schema integration. ACM Comput Surv 18(4):323–364

    Article  Google Scholar 

  • Benjelloun O, Garcia-Molina H, Menestrina D, Su Q, Whang SE, Widom J, Jonas J (2005) Swoosh: a generic approach to entity resolution. Technical report.

  • Bizer C, Cyganiak R (2006) D2r server publishing relational databases on the semantic web. In: Poster at the 5th international semantic web conference

  • Boran A, O’Sullivan D, Wade VP (2007) A case study of an ontology-driven dynamic data integration in a telecommunications supply chain. In FIRST, pp 1–13.

  • Caragea D, Pathak J, Bao J, Silvescu A, Andorf C, Dobbs D, Honavar V (2005) Information integration and knowledge acquisition from semantically heterogeneous biological data sources. In: 2nd International Workshop on Data Integration in the Life Sciences, 2005. Proceedings of the 16th international workshop on database and expert systems applications (DEXA05) 1529-4188/05 IEEE, pp 175–190. Springer.

  • Corcho O, Gómez-Pérez A (2000) Evaluating knowledge representation and reasoning capabilites of ontology specification languages. In: Proceedings of the ECAI’00 workshop on applications of ontologies and problem-solving methods

  • Cui Z, Azvine B (2006) Patent n. wo/2008/053212: Data processing

  • Dou D, LePendu P (2006) Ontology-based integration for relational databases. In: SAC ’06: Proceedings of the 2006 ACM symposium on Applied computing, New York, ACM. pp 461–466

  • Euzenat J (2004) An api for ontology alignment. In: Proceedings of 3rd conference on international semantic web conference (ISWC), Hiroshima (JP), 698–712

  • Fellegi IP, Sunter AB (1969) A theory for record linkage. J Am Stat Assoc 64(328):1183–1210

    Article  Google Scholar 

  • Halevy A, Rajaraman A, Ordille J (2006) Data integration: the teenage years. In: VLDB ’06: Proceedings of the 32nd international conference on very large data bases. VLDB Endowment, pp 9–16

  • Hammer, J (1999) The information integration wizard (iwiz) project. Technical report, University of Florida

  • Hammer J (2001) Overview of the integration wizard project for querying and managing semistructured data in heterogeneous sources. Chiang Mai University, Chiang Mai

  • Hammer J, Stonebraker M, Topsakal O (2005) Thalia: test harness for the assessment of legacy information integration approaches. 21th International conference on data engineering (ICDE), 485–486

  • Horrocks I, Patel-Schneider PF, Boley H, Tabet S, Grosof B, Dean M (2004) Swrl: a semantic web rule language combining owl and ruleml. W3C Member Submission

  • Leida M, Ceravolo P, Cui Z, Damiani E, Gusmini A (2008) Oddi: A framework for semi-automatic data integration. ICEIS, pp 46–60

  • Lenzerini M (2001) Data integration is harder than you thought. In: CooplS ’01: Proceedings of the 9th international conference on cooperative information systems. Springer, London, pp 22–26

  • Lenzerini M (2002) Data integration: a theoretical perspective. In: PODS ’02: Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. pp 233–246, ACM Press, New York

  • McGuinness DL, van Harmelen F (2004) Owl web ontology language overview. W3C Recommendation

  • Motik B, Sattler U, Studer R (2005) Query answering for owl-dl with rules. J Web Semant Sci Serv Agents World Wide Web 3(1):41–60

    Google Scholar 

  • Prud’hommeaux E Seaborne A (2007) Sparql query language for rdf (working draft). Technical report, W3C

  • Sharyn O, David E, Martin B (2008) Method and apparatus for access, integration, and analysis of heterogeneous data sources via the manipulation of metadata objects. Patent number: 12/169,477, Publication number: US 2008/0270456 A1, Filing date: Jul 8, 2008, Issued patent: US8171050 (Issue date May 1, 2012). http://www.google.com/patents?id=dyKwAAAAEBAJ&printsec=abstract&zoom=4#v=onepage&q&f=false

  • Sheth AP, Larson JA (1990) Federated database systems for managing distributed, heterogeneous, and autonomous databases. ACM Comput Surv 22(3):183–236

    Article  Google Scholar 

  • Smith B (2008) Ontology (science). FOIS 21–35

  • Stonebraker M, Aoki PM, Litwin W, Pfeffer A, Sah A, Sidell J, Staelin C, Yu A (1996) Mariposa: a wide-area distributed database system. VLDB J 5(1):48–63

    Article  Google Scholar 

  • Wang J, Lu J, Zhang Y, Miao Z, Zhou B (2009) Integrating heterogeneous data source using ontology. J Softw 4(8):843–850

    Google Scholar 

  • Wang J, Zhang Y, Miao Z, Lu J (2010) Query transformation in ontology-based relational data integration. In: Proceedings of the 2010 Asia-Pacific conference on wearable computing systems, APWCS ’10, Washington, DC, USA. IEEE Computer Society. pp 303–306.

  • Ziegler P (2007a) Evaluation of sirup with the thalia benchmark for data integration systems. Technical Report ifi-2007.08, Department of Informatics, University of Zurich

  • Ziegler P (2007b) The SIRUP approach to personal semantic data integration. PhD thesis, University of Zurich

  • Ziegler P, Dittrich KR (2004) Three decades of data integration—all problems solved? In: 18th IFIP World Computer Congress (WCC 2004), vol 12, Building the Information Society, pp 3–12

Download references

Acknowledgments

The authors would like to thank Prof. Ernesto Damiani of Universitá degli Studi di Milano for his valuable support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marcello Leida.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Leida, M., Gusmini, A. & Davies, J. Semantics-aware data integration for heterogeneous data sources. J Ambient Intell Human Comput 4, 471–491 (2013). https://doi.org/10.1007/s12652-012-0165-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12652-012-0165-4

Keywords

Navigation