Semi-automated schema integration with SASMINT

Unal, Ozgul; Afsarmanesh, Hamideh

doi:10.1007/s10115-009-0217-z

Semi-automated schema integration with SASMINT

Regular Paper
Open access
Published: 19 June 2009

Volume 23, pages 99–128, (2010)
Cite this article

Download PDF

You have full access to this open access article

Knowledge and Information Systems Aims and scope Submit manuscript

Semi-automated schema integration with SASMINT

Download PDF

Ozgul Unal¹ &
Hamideh Afsarmanesh¹

698 Accesses
7 Citations
Explore all metrics

Abstract

The emergence of increasing number of collaborating organizations has made clear the need for supporting interoperability infrastructures, enabling sharing and exchange of data among organizations. Schema matching and schema integration are the crucial components of the interoperability infrastructures, and their semi-automation to interrelate or integrate heterogeneous and autonomous databases in collaborative networks is desired. The Semi-Automatic Schema Matching and INTegration (SASMINT) System introduced in this paper identifies and resolves several important syntactic, semantic, and structural conflicts among schemas of relational databases to find their likely matches automatically. Furthermore, after getting the user validation on the matched results, it proposes an integrated schema. SASMINT uses a combination of a variety of metrics and algorithms from the Natural Language Processing and Graph Theory domains for its schema matching. For the schema integration, it utilizes a number of derivation rules defined in the scope of the research work explained in this paper. Furthermore, a derivation language called SASMINT Derivation Markup Language (SDML) is defined for capturing and formulating both the results of matching and the integration that can be further used, for example for federated query processing from independent databases. In summary, the paper focuses on addressing: (1) conflicts among schemas that make automatic schema matching and integration difficult, (2) the main components of the SASMINT approach and system, (3) in-depth exploration of SDML, (4) heuristic rules designed and implemented as part of the schema integration component of the SASMINT system, and (5) experimental evaluation of SASMINT.

Article PDF

BEAR: Revolutionizing Service Domain Knowledge Graph Construction with LLM

Situational Data Integration in Question Answering systems: a survey over two decades

Article 18 June 2024

Generic and Declarative Approaches to Data Quality Management

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Afsarmanesh H, Wiedijk M, Hertzberger LO et al (1996) Cooperation of CIM expert systems supported by PEER. J Stud Inf Control 5(2): 157–169
Google Scholar
Afsarmanesh H, Wiedijk M, Tuijnman F et al (1994) The PEER information management language user manual. Technical Report. Department of Computer Systems, University of Amsterdam
An Y, Mylopoulos J, Borgida A (2006) Building semantic mappings from databases to ontologies. In: Twenty-First National Conference on Artificial Intelligence (AAAI-06) Nectar Track, Boston
Arens Y, Knoblock CA, Shen W-M (1996) Query reformulation for dynamic information integration. J Intell Inf Syst 6(2/3): 99–130
Article Google Scholar
Aumueller D, Do HH, Massmann S et al (2005) Schema and ontology matching with COMA++. In: ACM SIGMOD international conference on management of data. ACM, Baltimore, pp 906–908
Aygün RS (2008) S2S: structural-to-syntactic matching similar documents. Knowl Inf Syst 16(3): 303–329
Article Google Scholar
Batini C, Lenzerini M (1984) A methodology for data schema integration in the entity relationship model. IEEE Trans Softw Eng 10(6): 650–664
Article Google Scholar
Batini C, Lenzerini M, Navathe S (1986) A comparative analysis of methodologies for database schema integration. ACM Comput Surv 18(4): 323–364
Article Google Scholar
Bayardo RJ, Bohrer W, Brice R et al (1997) InfoSleuth: agent-based semantic integration of information in open and dynamic environments. In: ACM SIGMOD international conference on management of data. ACM, Tucson, pp 195–206
Bergamaschi S, Castano S, Vimercati SDCD et al (1998) A semantic approach to information integration: the MOMIS project. In: Sesto Convegno della Associazione Italiana per l’Intelligenza Artificiale (AI*IA98), Padova, Italy
Bernstein PA, Melnik S, Petropoulos M et al (2004) Industrial-strength schema matching. SIGMOD Rec 33(4): 38–43
Article Google Scholar
Blondel VD, Gajardo A, Heymans M et al (2004) A measure of similarity between graph vertices: applications to synonym extraction and Web searching. SIAM Rev 46(4): 647–666
Article MATH MathSciNet Google Scholar
Candan KS, Kim JW, Liu H et al (2006) Discovering mappings in hierarchical data from multiple sources using the inherent structure. Knowl Inf Syst 10(2): 185–210
Article Google Scholar
Chiticariu L, Kolaitis PG, Popa L (2008) Interactive generation of integrated schemas. In: ACM SIGMOD international conference on management of data. ACM, Vancouver, pp 833–846
Choi N, Song I-Y, Han H (2006) A survey on ontology mapping. SIGMOD Rec 35(3): 34–41
Article Google Scholar
Cleverdon CW, Keen EM (1966) Aslib–Cranfield research project. Technical Report. Cranfield Institute of Technology, Cranfield
Dayal U, Hwang H-Y (1982) View definition and generalization for database integration in multibase: a system for heterogeneous distributed databases. In: Berkeley workshop, pp 203–238
Do HH, Rahm E (2002) COMA—a system for flexible combination of schema matching approaches. In: International conference on very large databases (VLDB), VLDB Endowment. Hong Kong, China, pp 610–621
Doan AH, Domingos P, Halevy A (2001) Reconciling schemas of disparate data sources—a machine-learning approach. In: ACM SIGMOD international conference on management of data. ACM, Santa Barbara, pp 509–520
ElMasri R, Larson J, Navathe SB (1987) Integration algorithms for federated databases and logical database design. Technical Report. Honeywell Corporate Systems Development Division
Embley DW, Xu L, Ding Y (2004) Automatic direct and indirect schema mapping: experiences and lessons learned. SIGMOD Rec 33(4): 14–19
Article Google Scholar
Euzenat J, Shvaiko P (2007) Ontology matching. Springer, Heidelberg, p p 445
MATH Google Scholar
Fellbaum C (1998) An electronic lexical database. MIT press, Cambridge, p p 445
MATH Google Scholar
Gal A (2006) Managing uncertainty in schema matching with Top-K schema mappings. J Data Semant Special Issue Emerg Semant 6: 90–114
Google Scholar
Gal A (2007) Why is schema matching tough and what can we do about it. SIGMOD Rec 35(4): 2–5
Article Google Scholar
Garcia-Molina H, Papakonstantinou Y, Quass D et al (1997) The TSIMMIS approach to mediation: data models and languages. J Intell Inf Syst 8(2): 117–132
Article Google Scholar
Giunchiglia F, Yatskevich M, Shvaiko P (2007) Semantic matching: algorithms and implementation. J Data Semant 9: 1–38
Google Scholar
Goh C, Bresson S, Madnich S et al (1999) Context interchange: new features and formalisms for the intelligent integration of information. ACM Trans Inf Syst 17(3): 270–293
Article Google Scholar
GraphML (2008) http://graphml.graphdrawing.org/
GXL (2008) http://www.gupro.de/GXL/
Haase P, Siebes R, Harmelen Fv (2008) Expertise-based peer selection in peer-to-peer networks. Knowl Inf Syst 15(1): 75–107
Article Google Scholar
Jaccard P (1912) The distribution of flora in the alpine zone. New Phytol 11(2): 37–50
Article Google Scholar
Jaro MA (1995) Probabilistic linkage of large public health data files. Stat Med 14: 491–498
Article Google Scholar
JGraph (2008) http://www.jgraph.com/
JGraphT (2008) http://jgrapht.sourceforge.net/
Kalfoglou Y, Schorlemmer M (2003) Ontology mapping: the state of the art. Knowl Eng Rev J 18(1): 1–31
Article Google Scholar
Lesk M (1986) Automatic sense disambiguation using machine readable dictionaries: how to tell a pine code from an ice cream cone. In: 5th international conference on systems documentation. Toronto, Ontario, Canada, pp 24–26
Levenshtein VI (1966) Binary codes capable of correcting deletions, insertions, and reversals. Cybern Control Theor 10(8): 707–710
MathSciNet Google Scholar
Li W, Clifton C (2000a) SEMINT: a tool for identifying attribute correspondence in heterogeneous databases using neural networks. J Data Knowl Eng 33(1): 49–84
Article MATH Google Scholar
Li W, Clifton C, Liu SY (2000b) Using neural networks: implementation and experiences. Knowl Inf Syst 2(1): 73–96
Article MATH Google Scholar
Madhavan J, Bernstein PA, Rahm E (2001) Generic schema matching with cupid. In: International conference on very large databases (VLDB). Morgan Kaufmann, San Francisco, pp 49–58
Magnani M, Montesi D (2007) Uncertainty in data integration: current approaches and open problems. In: International VLDB workshop on management of uncertain data, pp 18–32
Mannino MV, Effelsberg W (1984) A methodology for global schema design. Technical Report, Computer and Information Sciences Department, University of Florida
Melnik S, Garcia-Molina H, Rahm E (2002) Similarity flooding: a versatile graph matching algorithm and its application to schema matching. In: International conference on data engineering. IEEE Computer Society, San Jose, CA, USA, pp 117–128
Melnik S, Rahm E, Bernstein PA (2003) Rondo: a programming platform for generic model management. In: ACM SIGMOD international conference on management of data, pp 193–204
Mena E, Illarramendi A, Kashyap V et al (2000) OBSERVER: an approach for query processing in global information systems based on interoperation across pre-existing ontologies. Distrib Parallel Databases J 8(2): 223–271
Article Google Scholar
Miller RJ, Haas LM, Hernandez MA (2000) Schema mapping as query discovery. In: International conference on very large databases (VLDB). Morgan Kaufmann, Cairo, pp 77–88
Monge AE, Elkan C (1996) The field matching problem: algorithms and applications. In: Second international conference on knowledge discovery and data mining. AAAI Press, Portland, pp 267–270
Motro A, Buneman P (1981) Constructing superviews. In: ACM SIGMOD international conference on management of data, ACM, Ann Arbor, pp 56–64
Nottelmann H, Straccia U (2007) Information retrieval and machine learning for probabilistic schema matching. Inf Process Manage 43(3): 552–576
Article Google Scholar
Pinto HS, Martins JP (2004) Ontologies: how can they be built. Knowl Inf Syst 6(4): 441–464
Article Google Scholar
Pottinger R, Bernstein PA (2008) Schema merging and mapping creation for relational sources. In: International conference on extending database technology (EDBT). ACM, Nantes, pp 73–84
Pottinger RA, Bernstein PA (2003) Merging models based on given correspondences. In: International conference on very large databases (VLDB). Morgan Kaufmann, Berlin, pp 826–873
Rahm E, Do HH, Massmann S (2004) Matching large XML schemas. SIGMOD Rec 33(4): 26–31
Article Google Scholar
Rijsbergen CJV (1979) Information retrieval. Butterworth, London
Google Scholar
Saleem K, Bellahsene Z, Hunt E (2008) PORSCHE: Performance ORiented SCHEma mediation. Inf Syst 33(7–8): 637–657
Article Google Scholar
Salton G, Yang CS (1973) On the specification of term values in automatic indexing. J Documentation 29: 351–372
Article Google Scholar
Sheth A, Larson J (1990) Federated database systems for managing distributed, heterogeneous, and autonomous databases. ACM Comput Surv 22(3): 183–236
Article Google Scholar
Tuijnman F, Afsarmanesh H (1993) Management of shared data in federated cooperative PEER environment. Int J Intell Cooperation Inf Syst 2(4): 451–473
Article Google Scholar
Unal O, Afsarmanesh H (2006a) Interoperability in collaborative network of biodiversity organizations. In: 7th PRO-VE. Springer, Helsinki, pp 515–524
Unal O, Afsarmanesh H (2006b) SASMINT system for database interoperability in collaborative networks. In: OTM conferences, Lecture Notes in Computer Science. Springer, Montpellier, pp 91–108
Unal O, Afsarmanesh H (2006c) Using linguistic techniques for schema matching. In: International conference on software and data technologies. INSTICC Press, Setubal, pp 115–120
Wan X (2008) Beyond topical similarity: a structural similarity measure for retrieving highly similar documents. Knowl Inf Syst 15(1): 55–73
Article Google Scholar
Wang G, Goguen J, Nam Y et al (2004) Critical points for interactive schema matching. In: Sixth Asia Pacific web conference. Lecture Notes in Computer Science, Springer, pp 654–664
Wu Z, Palmer M (1994) Verb semantics and lexical selection. In: 32nd annual meeting of the association for computational linguistics. Association for Computational Linguistics, Las Cruces, pp 133–138

Download references

Open Access

This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution,and reproduction in any medium, provided the original author(s) and source are credited.

Author information

Authors and Affiliations

Informatics Institute, University of Amsterdam, Amsterdam, The Netherlands
Ozgul Unal & Hamideh Afsarmanesh

Authors

Ozgul Unal
View author publications
You can also search for this author in PubMed Google Scholar
Hamideh Afsarmanesh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ozgul Unal.

Rights and permissions

Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Reprints and permissions

About this article

Cite this article

Unal, O., Afsarmanesh, H. Semi-automated schema integration with SASMINT. Knowl Inf Syst 23, 99–128 (2010). https://doi.org/10.1007/s10115-009-0217-z

Download citation

Received: 04 August 2008
Revised: 19 February 2009
Accepted: 11 April 2009
Published: 19 June 2009
Issue Date: April 2010
DOI: https://doi.org/10.1007/s10115-009-0217-z

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Semi-automated schema integration with SASMINT

Abstract

Article PDF

Similar content being viewed by others

BEAR: Revolutionizing Service Domain Knowledge Graph Construction with LLM

Situational Data Integration in Question Answering systems: a survey over two decades

Generic and Declarative Approaches to Data Quality Management

References

Open Access

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Semi-automated schema integration with SASMINT

Abstract

Article PDF

Similar content being viewed by others

BEAR: Revolutionizing Service Domain Knowledge Graph Construction with LLM

Situational Data Integration in Question Answering systems: a survey over two decades

Generic and Declarative Approaches to Data Quality Management

References

Open Access

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation