EvoMatch: An Evolutionary Algorithm for Inferring Schematic Correspondences

Guo, Chenjuan; Hedeler, Cornelia; Paton, Norman W.; Fernandes, Alvaro A. A.

doi:10.1007/978-3-642-45315-1_1

Chenjuan Guo¹⁸,
Cornelia Hedeler¹⁸,
Norman W. Paton¹⁸ &
…
Alvaro A. A. Fernandes¹⁸

Part of the book series: Lecture Notes in Computer Science ((TLDKS,volume 8320))

381 Accesses
1 Citations

Abstract

Schema matching provides an important foundation for both manual and semi-automatic derivation of mappings between sources. However, schema matchers typically return large numbers of potentially inconsistent matches that are neither conducive to automatic mapping generation nor readily digested by mapping developers. This paper presents a method, EvoMatch, for automatically inferring schematic correspondences, from which mappings can be generated directly. It aims to offer a more expressive characterization of the relationships between sources than matches identified by existing schema matching methods. In particular, the paper contributes: i) an evolutionary search method for inferring schematic correspondences; ii) an objective function for calculating the fitness value of a solution within the search space; and iii) an empirical evaluation assessing the effectiveness of EvoMatch for inferring schematic correspondences in comparison with well established existing techniques. In doing so, EvoMatch automatically identifies correspondences that can be used directly to bootstrap information integration systems, or to inform the manual refinement of mappings.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Halevy, A.Y., Rajaraman, A., Ordille, J.J.: Data integration: The teenage years. In: VLDB, pp. 9–16 (2006)
Google Scholar
Do, H., Rahm, E.: Matching large schemas: Approaches and evaluation. Information Systems 32(6), 857–885 (2007)
Article Google Scholar
Melnik, S., Garcia-Molina, H., Rahm, E.: Similarity flooding: a versatile graph matching algorithm and itsapplication to schema matching. In: ICDE, pp. 117–128 (2002)
Google Scholar
Dhamankar, R., Lee, Y., Doan, A., Halevy, A.Y., Domingos, P.: imap: Discovering complex mappings between database schemas. In: SIGMOD Conference, pp. 383–394 (2004)
Google Scholar
Fagin, R., Haas, L.M., Hernández, M., Miller, R.J., Popa, L., Velegrakis, Y.: Clio: Schema mapping creation and data exchange. In: Borgida, A.T., Chaudhri, V.K., Giorgini, P., Yu, E.S. (eds.) Conceptual Modeling: Foundations and Applications. LNCS, vol. 5600, pp. 198–236. Springer, Heidelberg (2009)
Chapter Google Scholar
Bonifati, A., Chang, E.Q., Ho, T., Lakshmanan, L.V.S., Pottinger, R., Chung, Y.: Schema mapping and query translation in heterogeneous p2p xml databases. VLDB J. 19(2), 231–256 (2010)
Article Google Scholar
Bonifati, A., Mecca, G., Pappalardo, A., Raunich, S., Summa, G.: Schema mapping verification: the spicy way. In: EDBT, pp. 85–96 (2008)
Google Scholar
Marnette, B., Mecca, G., Papotti, P., Raunich, S., Santoro, D.: ++spicy: an opensource tool for second-generation schema mapping and data exchange. PVLDB 4(12), 1438–1441 (2011)
Google Scholar
Franklin, M., Halevy, A., Maier, D.: From databases to dataspaces: a new abstraction for information management. SIGMOD Record 34(4), 27–33 (2005)
Article Google Scholar
Belhajjame, K., Paton, N.W., Embury, S.M., Fernandes, A.A.A., Hedeler, C.: Feedback-based annotation, selection and refinement of schema mappings for dataspaces. In: EDBT, pp. 573–584 (2010)
Google Scholar
Salles, M.A.V., Dittrich, J.-P., Karakashian, S.K., Girard, O.R., Blunschi, L.: itrails: Pay-as-you-go information integration in dataspaces. In: VLDB, pp. 663–674 (2007)
Google Scholar
Sarma, A.D., Dong, X., Halevy, A.Y.: Bootstrapping pay-as-you-go data integration systems. In: SIGMOD Conference, pp. 861–874 (2008)
Google Scholar
Mao, L., Belhajjame, K., Paton, N.W., Fernandes, A.A.A.: Defining and using schematic correspondences for automatically generating schema mappings. In: van Eck, P., Gordijn, J., Wieringa, R. (eds.) CAiSE 2009. LNCS, vol. 5565, pp. 79–93. Springer, Heidelberg (2009)
Chapter Google Scholar
Kim, W., Seo, J.: Classifying schematic and data heterogeneity in multidatabase systems. IEEE Computer 24(12), 12–18 (1991)
Article Google Scholar
Bernstein, P.A., Madhavan, J., Rahm, E.: Generic schema matching, ten years later. PVLDB 4(11), 695–701 (2011)
Google Scholar
Cafarella, M.J., Halevy, A.Y., Wang, D.Z., Wu, E., Zhang, Y.: Webtables: exploring the power of tables on the web. PVLDB 1(1), 538–549 (2008)
Google Scholar
Smith, K., Morse, M., Mork, P., Li, M.H., Rosenthal, A., Allen, D., Seligman, L.: The role of schema matching in large enterprises. In: CIDR (2009)
Google Scholar
Kang, J., Naughton, J.F.: Schema matching using interattribute dependencies. IEEE Trans. Knowl. Data Eng. 20(10), 1393–1407 (2008)
Article Google Scholar
Bilke, A., Naumann, F.: Schema matching using duplicates. In: ICDE, pp. 69–80 (2005)
Google Scholar
Wang, T., Pottinger, R.: Semap: a generic mapping construction system. In: EDBT, pp. 97–108 (2008)
Google Scholar
Giunchiglia, F., Yatskevich, M., Shvaiko, P.: Semantic matching: Algorithms and implementation. J. Data Semantics 9, 1–38 (2007)
Google Scholar
Rizopoulos, N.: Automatic discovery of semantic relationships between schema elements. In: ICEIS, vol. (1), pp. 3–8 (2004)
Google Scholar
Xu, L., Embley, D.W.: A composite approach to automating direct and indirect schema mappings. Inf. Syst. 31(8), 697–732 (2006)
Article Google Scholar
Dai, B.T., Koudas, N., Srivastava, D., Tung, A.K.H., Venkatasubramanian, S.: Validating multi-column schema matchings by type. In: ICDE, pp. 120–129 (2008)
Google Scholar
Warren, R.H., Tompa, F.W.: Multi-column substring matching for database schema translation. In: VLDB, pp. 331–342 (2006)
Google Scholar
Miller, G.A.: Wordnet: A lexical database for english, Commun. ACM 38(11), 39–41 (1995)
Article Google Scholar
Elmeleegy, H., Ouzzani, M., Elmagarmid, A.K.: Usage-based schema matching. In: ICDE, pp. 20–29 (2008)
Google Scholar
Madhavan, J., Bernstein, P.A., Doan, A., Halevy, A.Y.: Corpus-based schema matching. In: ICDE, pp. 57–68 (2005)
Google Scholar
Haas, L., Hernández, M., Ho, H., Popa, L., Roth, M.: Clio grows up: from research prototype to industrial tool. In: ACM SIGMOD, pp. 805–810 (2005)
Google Scholar
Alexe, B., Chiticariu, L., Miller, R.J., Tan, W.C.: Muse: Mapping understanding and design by example. In: ICDE, pp. 10–19 (2008)
Google Scholar
Ozsu, M.T., Valduriez, P.: Principles of distributed database systems. Addison-Wesley, Reading (1989)
Google Scholar
Eiben, A., Smith, J.: Introduction to evolutionary computing. Springer (2003)
Google Scholar
Blum, C., Roli, A.: Metaheuristics in combinatorial optimization: Overview and conceptual comparison. ACM Comput. Surv. 35(3), 268–308 (2003)
Article Google Scholar
Michalewicz, Z., Fogel, D.: How to solve it: modern heuristics. Springer-Verlag New York Inc. (2004)
Google Scholar
Baeza-Yates, R., Ribeiro-Neto, B., et al.: Modern information retrieval. ACM Press, New York (1999)
Google Scholar
Miller, R.J., Fisla, D., Huang, M., Kymlicka, D., Ku, F., Lee, V.: The Amalgam Schema and Data Integration Test Suite (2001), http://www.cs.toronto.edu/~miller/amalgam
Engmann, D., Maßmann, S.: Instance matching with coma++. In: BTW Workshops, pp. 28–37 (2007)
Google Scholar
Massmann, S., Engmann, D., Rahm, E.: Coma++: Results for the ontology alignment contest oaei, Ontology Matching (2006)
Google Scholar
Low, Y., Bickson, D., Gonzalez, J., Guestrin, C., Kyrola, A., Hellerstein, J.M.: Distributed graphlab: A framework for machine learning and data mining in the cloud. Proceedings of the VLDB Endowment 5(8), 716–727 (2012)
Google Scholar
Yuan, P., Sha, C., Wang, X., Yang, B., Zhou, A., Yang, S.: Xml structural similarity search using mapreduce. In: Chen, L., Tang, C., Yang, J., Gao, Y. (eds.) WAIM 2010. LNCS, vol. 6184, pp. 169–181. Springer, Heidelberg (2010)
Chapter Google Scholar
Kolb, L., Thor, A., Rahm, E.: Load balancing for mapreduce-based entity resolution. In: 2012 IEEE 28th International Conference on Data Engineering (ICDE), pp. 618–629. IEEE (2012)
Google Scholar
Ma, Q., Yang, B., Qian, W., Zhou, A.: Query processing of massive trajectory data based on mapreduce. In: CloudDb, pp. 9–16 (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, University of Manchester, M13 9PL, UK
Chenjuan Guo, Cornelia Hedeler, Norman W. Paton & Alvaro A. A. Fernandes

Authors

Chenjuan Guo
View author publications
You can also search for this author in PubMed Google Scholar
Cornelia Hedeler
View author publications
You can also search for this author in PubMed Google Scholar
Norman W. Paton
View author publications
You can also search for this author in PubMed Google Scholar
Alvaro A. A. Fernandes
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

IRIT, Paul Sabatier University, Toulouse, France
Abdelkader Hameurlain
FAW, University of Linz, Austria
Josef Küng & Roland Wagner &

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Guo, C., Hedeler, C., Paton, N.W., Fernandes, A.A.A. (2013). EvoMatch: An Evolutionary Algorithm for Inferring Schematic Correspondences. In: Hameurlain, A., Küng, J., Wagner, R. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems XII. Lecture Notes in Computer Science, vol 8320. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-45315-1_1

Download citation

DOI: https://doi.org/10.1007/978-3-642-45315-1_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-45314-4
Online ISBN: 978-3-642-45315-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics