Abstract
Schema matching has long been heading towards complete automation. However, the difficulty arising from heterogeneity in the data sources, domain specificity or structure complexity has led to a plethora of semi-automatic matching tools. Besides, letting users the possibility to tune a tool also provides more flexibility, for instance to increase the matching quality. In the recent years, much work has been carried out to support users in the tuning process, specifically at higher levels. Indeed, tuning occurs at every step of the matching process. At the lowest level, similarity measures include internal parameters which directly impact computed similarity values. Furthermore, a common filter to present mappings to users are the thresholds applied to these values. At a mid-level, users can adopt one or more strategies according to the matching tool that they use. These strategies aim at combining similarity measures in an efficient way. Several tools support the users in this task, mainly by providing state-of-the-art graphical user interfaces. Automatically tuning a matching tool at this level is also possible, but this is limited to a few matching tools. The highest level deals with the choice of the matching tool. Due to the proliferation of these approaches, the first issue for the user is to find the one which would best satisfies his/her criteria. Although benchmarking available matching tools with datasets can be useful, we show that several approaches have been recently designed to solve this problem.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
SecondString (May 2010): http://sourceforge.net/projects/secondstring/.
- 2.
SimMetrics (May 2010): http://www.dcs.shef.ac.uk/∼sam/stringmetrics.html.
References
Anan M, Avigdor G (2008) Boosting schema matchers. In: OTM ’08: Proceedings of the OTM 2008 confederated international conferences, CoopIS, DOA, GADA, IS, and ODBASE 2008. Part I on on the move to meaningful internet systems. Springer, Heidelberg, pp 283–300. doi:http://dx.doi.org/10.1007/978-3-540-88871-0_20
Aumueller D, Do HH, Massmann S, Rahm E (2005) Schema and ontology matching with COMA +  + . In: ACM SIGMOD. ACM, NY, pp 906–908
Avesani P, Giunchiglia F, Yatskevich M (2005) A large scale taxonomy mapping evaluation. In: ISWC 2005, Galway, pp 67–81
Avigdor G (2005) On the cardinality of schema matching. In: OTM workshops, pp 947–956
Berlin J, Motro A (2001) Automated discovery of contents for virtual databases. In: CoopIS. Springer, Heidelberg, pp 108–122
Berlin J, Motro A (2002) Database schema matching using machine learning with feature selection. In: CAiSE. Springer, London, pp 452–466
Bellahsene Z, Bonifati A, Duchateau F, Velegrakis Y (2011) On evaluating schema matching and mapping. In: Bellahsene Z, Bonifati A, Rahm E (eds) Schema matching and mapping. Data-694 Centric Systems and Applications Series. Springer, Heidelberg
Bozovic N, Vassalos V (2008) Two-phase schema matching in real world relational databases. In: ICDE Workshops, pp 290–296
Carmel D, Avigdor G, Haggai R (2007) Rank aggregation for automatic schema matching. IEEE Trans Knowl Data Eng 19(4):538–553. doi:http://dx.doi.org/10.1109/TKDE.2007.1010
Cohen W, Ravikumar P, Fienberg S (2003) A comparison of string distance metrics for name-matching tasks. In: Proceedings of the IJCAI-2003. http://citeseer.ist.psu.edu/cohen03comparison.html
Cruz IF, Sunna W, Makar N, Bathala S (2007) A visual tool for ontology alignment to enable geospatial interoperability. J Vis Lang Comput 18(3):230–254
Cruz IF, Antonelli FP, Stroe C (2009) Agreementmaker: Efficient matching for large real-world schemas and ontologies. Proc VLDB Endow 2(2):1586–1589
Dhamankar R, Lee Y, Doan A, Halevy A, Domingos P (2004) iMAP: Discovering complex semantic matches between database schemas. In: ACM SIGMOD. ACM, NY, pp 383–394
Do HH, Rahm E (2002) COMA – A system for flexible combination of schema matching approaches. In: VLDB. VLDB Endowment, pp 610–621
Do HH, Melnik S, Rahm E (2002) Comparison of schema matching evaluations. In: Web, web-services, and database systems workshop. Springer, London, pp 221–237
Doan A, Domingos P, Halevy AY (2001) Reconciling schemas of disparate data sources – A machine learning approach. In: ACM SIGMOD. ACM, NY, pp 509–520
Doan A, Madhavan J, Dhamankar R, Domingos P, Halevy AY (2003) Learning to match ontologies on the semantic web. VLDB J 12(4):303–319
Drumm C, Schmitt M, Do HH, Rahm E (2007) Quickmig: Automatic schema matching for data migration projects. In: CIKM. ACM, NY, pp 107–116. doi:http://doi.acm.org/10.1145/1321440.1321458
Duchateau F (2009) Towards a generic approach for schema matcher selection: Leveraging user pre- and post-match effort for improving quality and time performance. PhD thesis, Université Montpellier II – Sciences et Techniques du Languedoc. http://tel.archives-ouvertes.fr/tel-00436547/en/
Duchateau F, Bellahsene Z, Hunt E (2007) Xbenchmatch: A benchmark for xml schema matching tools. In: VLDB. VLDB Endowment, pp 1318–1321
Duchateau F, Bellahsene Z, Coletta R (2008a) A flexible approach for planning schema matching algorithms. In: OTM Conferences (1), Springer, Heidelberg, pp 249–264
Duchateau F, Bellahsene Z, Roche M (2008b) Improving quality and performance of schema matching in large scale. Ingénierie des Systèmes d’Information 13(5):59–82
Duchateau F, Coletta R, Bellahsene Z, Miller RJ (2009a) (not) yet another matcher. In: CIKM ACM, Hong Kong, pp 1537–1540
Duchateau F, Coletta R, Bellahsene Z, Miller RJ (2009b) Yam: A schema matcher factory. In: CIKM ACM, Hong Kong, pp 2079–2080
Ehrig M, Staab S, Sure Y (2005) Bootstrapping ontology alignment methods with APFEL. In: ISWC, ACM, NY, pp 1148–1149
Euzenat J, et al (2004) State of the art on ontology matching. Tech. Rep. KWEB/2004/D2.2.3/v1.2, Knowledge Web
Ferrara A, Lorusso D, Montanelli S, Varese G (2008) Towards a benchmark for instance matching. In: Shvaiko P, Euzenat J, Giunchiglia F, Stuckenschmidt H (eds) OM. CEUR-WS.org, CEUR workshop proceedings, vol 431. http://dblp.uni-trier.de/db/conf/semweb/om2008.html#FerraraLMV08
Garner SR (1995) Weka: The waikato environment for knowledge analysis. In: Proceedings of the New Zealand computer science research students conference, pp 57–64
Giunchiglia F, Shvaiko P, Yatskevich M (2004) S-Match: An algorithm and an implementation of semantic matching. In: European semantic web symposium. ACM, NY, pp 61–75
Giunchiglia F, Shvaiko P, Yatskevich M (2007) Semantic matching: Algorithms and an implementation. Tech. rep., DISI, University of Trento. http://eprints.biblio.unitn.it/archive/00001148/
Hernandez MA, Miller RJ, Haas LM (2002) Clio: A semi-automatic tool for schema mapping (software demonstration). In: ACM SIGMOD, Madison
Köpcke H, Rahm E (2008) Training selection for tuning entity matching. In: QDB/MUD, VLDB, Auckland, pp 3–12
Lee Y, Sayyadian M, Doan A, Rosenthal A (2007) etuner: Tuning schema matching software using synthetic scenarios. VLDB J 16(1):97–122
Levenshtein V (1966) Binary codes capable of correcting deletions, insertions and reversals. Sov Phys Dokl 10:707
Li J, Tang J, Li Y, Luo Q (2009) Rimom: A dynamic multistrategy ontology alignment framework. IEEE Trans Knowl Data Eng 21(8):1218–1232. http://dx.doi.org/10.1109/TKDE.2008.202
Li WS, Clifton C (2000) Semint: a tool for identifying attribute correspondences in heterogeneous databases using neural networks. Data Knowl Eng 33(1):49–84. http://dx.doi.org/10.1016/S0169-023X(99)00044-0
Madhavan J, Bernstein PA, Rahm E (2001) Generic schema matching with cupid. In: VLDB. Morgan Kaufmann, CA, pp 49–58
Madhavan J, Bernstein PA, Doan A, Halevy AY (2005) Corpus-based schema matching. In: International conference on data engineering. IEEE Computer Society, Washington, DC, pp 57–68
Malgorzata M, Anja J, Jérôme E (2006) Applying an analytic method for matching approach selection. In: Shvaiko P, Euzenat J, Noy NF, Stuckenschmidt H, Benjamins VR, Uschold M (eds) Ontology matching. CEUR-WS.org, CEUR workshop proceedings, vol 225. http://dblp. http://www.uni-trier.de/db/conf/semweb/om2006.html#MocholJE06
Melnik S, Garcia-Molina H, Rahm E (2002) Similarity flooding: A versatile graph matching algorithm and its application to schema matching. In: ICDE. IEEE Computer Society, Washington, DC, pp 117–128
Melnik S, Rahm E, Bernstein PA (2003) Developing metadata-intensive applications with rondo. J Web Semant I:47–74
Mork P, Seligman L, Rosenthal A, Korb J, Wolf C (2008) The harmony integration workbench. J Data Semant 11:65–93
Naumann F, Ho CT, Tian X, Haas LM, Megiddo N (2002) Attribute classification using feature analysis. In: ICDE. IEEE Computer Society, Washington, p 271
Noy N, Musen M (2001) Anchor-PROMPT: Using non-local context for semantic matching. In: Proceedings of IJCAI 2001 workshop on ontology and information sharing, Seattle, pp 63–70
Saleem K, Bellahsene Z (2009) Complex schema match discovery and validation through collaboration. In: OTM Conferences (1). Springer, Heidelberg, pp 406–413
Saleem K, Bellahsene Z, Hunt E (2008) Porsche: Performance oriented schema mediation. Inf Syst 33(7–8):637–657
Shvaiko P, Euzenat J (2008) Ten challenges for ontology matching. In: OTM Conferences (2). Springer, Heidelberg, pp 1164–1182
Smith K, Morse M, Mork P, Li M, Rosenthal A, Allen D, Seligman L (2009) The role of schema matching in large enterprises. In: CIDR, Asilomar
Winkler W (1999) The state of record linkage and current research problems. In: Statistics of Income Division, Internal Revenue Service Publication R99/04
Wordnet (2007) http://wordnet.princeton.edu
Yatskevich M (2003) Preliminary evaluation of schema matching systems. Tech. Rep. DIT-03-028, Informatica e Telecomunicazioni, University of Trento
Zhang X, Zhong Q, Shi F, Li J, Tang J (2009) Rimom results for OAEI 2009. http://oaei.ontologymatching.org/2009/results/
Acknowledgements
We thank our reviewers for their comments and corrections on this chapter. We are also grateful to colleagues who have accepted the publication of pictures from their tools.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Bellahsene, Z., Duchateau, F. (2011). Tuning for Schema Matching. In: Bellahsene, Z., Bonifati, A., Rahm, E. (eds) Schema Matching and Mapping. Data-Centric Systems and Applications. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16518-4_10
Download citation
DOI: https://doi.org/10.1007/978-3-642-16518-4_10
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16517-7
Online ISBN: 978-3-642-16518-4
eBook Packages: Computer ScienceComputer Science (R0)