Skip to main content

EvoMatch: An Evolutionary Algorithm for Inferring Schematic Correspondences

  • Chapter
Transactions on Large-Scale Data- and Knowledge-Centered Systems XII

Part of the book series: Lecture Notes in Computer Science ((TLDKS,volume 8320))

Abstract

Schema matching provides an important foundation for both manual and semi-automatic derivation of mappings between sources. However, schema matchers typically return large numbers of potentially inconsistent matches that are neither conducive to automatic mapping generation nor readily digested by mapping developers. This paper presents a method, EvoMatch, for automatically inferring schematic correspondences, from which mappings can be generated directly. It aims to offer a more expressive characterization of the relationships between sources than matches identified by existing schema matching methods. In particular, the paper contributes: i) an evolutionary search method for inferring schematic correspondences; ii) an objective function for calculating the fitness value of a solution within the search space; and iii) an empirical evaluation assessing the effectiveness of EvoMatch for inferring schematic correspondences in comparison with well established existing techniques. In doing so, EvoMatch automatically identifies correspondences that can be used directly to bootstrap information integration systems, or to inform the manual refinement of mappings.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Halevy, A.Y., Rajaraman, A., Ordille, J.J.: Data integration: The teenage years. In: VLDB, pp. 9–16 (2006)

    Google Scholar 

  2. Do, H., Rahm, E.: Matching large schemas: Approaches and evaluation. Information Systems 32(6), 857–885 (2007)

    Article  Google Scholar 

  3. Melnik, S., Garcia-Molina, H., Rahm, E.: Similarity flooding: a versatile graph matching algorithm and itsapplication to schema matching. In: ICDE, pp. 117–128 (2002)

    Google Scholar 

  4. Dhamankar, R., Lee, Y., Doan, A., Halevy, A.Y., Domingos, P.: imap: Discovering complex mappings between database schemas. In: SIGMOD Conference, pp. 383–394 (2004)

    Google Scholar 

  5. Fagin, R., Haas, L.M., Hernández, M., Miller, R.J., Popa, L., Velegrakis, Y.: Clio: Schema mapping creation and data exchange. In: Borgida, A.T., Chaudhri, V.K., Giorgini, P., Yu, E.S. (eds.) Conceptual Modeling: Foundations and Applications. LNCS, vol. 5600, pp. 198–236. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  6. Bonifati, A., Chang, E.Q., Ho, T., Lakshmanan, L.V.S., Pottinger, R., Chung, Y.: Schema mapping and query translation in heterogeneous p2p xml databases. VLDB J. 19(2), 231–256 (2010)

    Article  Google Scholar 

  7. Bonifati, A., Mecca, G., Pappalardo, A., Raunich, S., Summa, G.: Schema mapping verification: the spicy way. In: EDBT, pp. 85–96 (2008)

    Google Scholar 

  8. Marnette, B., Mecca, G., Papotti, P., Raunich, S., Santoro, D.: ++spicy: an opensource tool for second-generation schema mapping and data exchange. PVLDB 4(12), 1438–1441 (2011)

    Google Scholar 

  9. Franklin, M., Halevy, A., Maier, D.: From databases to dataspaces: a new abstraction for information management. SIGMOD Record 34(4), 27–33 (2005)

    Article  Google Scholar 

  10. Belhajjame, K., Paton, N.W., Embury, S.M., Fernandes, A.A.A., Hedeler, C.: Feedback-based annotation, selection and refinement of schema mappings for dataspaces. In: EDBT, pp. 573–584 (2010)

    Google Scholar 

  11. Salles, M.A.V., Dittrich, J.-P., Karakashian, S.K., Girard, O.R., Blunschi, L.: itrails: Pay-as-you-go information integration in dataspaces. In: VLDB, pp. 663–674 (2007)

    Google Scholar 

  12. Sarma, A.D., Dong, X., Halevy, A.Y.: Bootstrapping pay-as-you-go data integration systems. In: SIGMOD Conference, pp. 861–874 (2008)

    Google Scholar 

  13. Mao, L., Belhajjame, K., Paton, N.W., Fernandes, A.A.A.: Defining and using schematic correspondences for automatically generating schema mappings. In: van Eck, P., Gordijn, J., Wieringa, R. (eds.) CAiSE 2009. LNCS, vol. 5565, pp. 79–93. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  14. Kim, W., Seo, J.: Classifying schematic and data heterogeneity in multidatabase systems. IEEE Computer 24(12), 12–18 (1991)

    Article  Google Scholar 

  15. Bernstein, P.A., Madhavan, J., Rahm, E.: Generic schema matching, ten years later. PVLDB 4(11), 695–701 (2011)

    Google Scholar 

  16. Cafarella, M.J., Halevy, A.Y., Wang, D.Z., Wu, E., Zhang, Y.: Webtables: exploring the power of tables on the web. PVLDB 1(1), 538–549 (2008)

    Google Scholar 

  17. Smith, K., Morse, M., Mork, P., Li, M.H., Rosenthal, A., Allen, D., Seligman, L.: The role of schema matching in large enterprises. In: CIDR (2009)

    Google Scholar 

  18. Kang, J., Naughton, J.F.: Schema matching using interattribute dependencies. IEEE Trans. Knowl. Data Eng. 20(10), 1393–1407 (2008)

    Article  Google Scholar 

  19. Bilke, A., Naumann, F.: Schema matching using duplicates. In: ICDE, pp. 69–80 (2005)

    Google Scholar 

  20. Wang, T., Pottinger, R.: Semap: a generic mapping construction system. In: EDBT, pp. 97–108 (2008)

    Google Scholar 

  21. Giunchiglia, F., Yatskevich, M., Shvaiko, P.: Semantic matching: Algorithms and implementation. J. Data Semantics 9, 1–38 (2007)

    Google Scholar 

  22. Rizopoulos, N.: Automatic discovery of semantic relationships between schema elements. In: ICEIS, vol. (1), pp. 3–8 (2004)

    Google Scholar 

  23. Xu, L., Embley, D.W.: A composite approach to automating direct and indirect schema mappings. Inf. Syst. 31(8), 697–732 (2006)

    Article  Google Scholar 

  24. Dai, B.T., Koudas, N., Srivastava, D., Tung, A.K.H., Venkatasubramanian, S.: Validating multi-column schema matchings by type. In: ICDE, pp. 120–129 (2008)

    Google Scholar 

  25. Warren, R.H., Tompa, F.W.: Multi-column substring matching for database schema translation. In: VLDB, pp. 331–342 (2006)

    Google Scholar 

  26. Miller, G.A.: Wordnet: A lexical database for english, Commun. ACM 38(11), 39–41 (1995)

    Article  Google Scholar 

  27. Elmeleegy, H., Ouzzani, M., Elmagarmid, A.K.: Usage-based schema matching. In: ICDE, pp. 20–29 (2008)

    Google Scholar 

  28. Madhavan, J., Bernstein, P.A., Doan, A., Halevy, A.Y.: Corpus-based schema matching. In: ICDE, pp. 57–68 (2005)

    Google Scholar 

  29. Haas, L., Hernández, M., Ho, H., Popa, L., Roth, M.: Clio grows up: from research prototype to industrial tool. In: ACM SIGMOD, pp. 805–810 (2005)

    Google Scholar 

  30. Alexe, B., Chiticariu, L., Miller, R.J., Tan, W.C.: Muse: Mapping understanding and design by example. In: ICDE, pp. 10–19 (2008)

    Google Scholar 

  31. Ozsu, M.T., Valduriez, P.: Principles of distributed database systems. Addison-Wesley, Reading (1989)

    Google Scholar 

  32. Eiben, A., Smith, J.: Introduction to evolutionary computing. Springer (2003)

    Google Scholar 

  33. Blum, C., Roli, A.: Metaheuristics in combinatorial optimization: Overview and conceptual comparison. ACM Comput. Surv. 35(3), 268–308 (2003)

    Article  Google Scholar 

  34. Michalewicz, Z., Fogel, D.: How to solve it: modern heuristics. Springer-Verlag New York Inc. (2004)

    Google Scholar 

  35. Baeza-Yates, R., Ribeiro-Neto, B., et al.: Modern information retrieval. ACM Press, New York (1999)

    Google Scholar 

  36. Miller, R.J., Fisla, D., Huang, M., Kymlicka, D., Ku, F., Lee, V.: The Amalgam Schema and Data Integration Test Suite (2001), http://www.cs.toronto.edu/~miller/amalgam

  37. Engmann, D., Maßmann, S.: Instance matching with coma++. In: BTW Workshops, pp. 28–37 (2007)

    Google Scholar 

  38. Massmann, S., Engmann, D., Rahm, E.: Coma++: Results for the ontology alignment contest oaei, Ontology Matching (2006)

    Google Scholar 

  39. Low, Y., Bickson, D., Gonzalez, J., Guestrin, C., Kyrola, A., Hellerstein, J.M.: Distributed graphlab: A framework for machine learning and data mining in the cloud. Proceedings of the VLDB Endowment 5(8), 716–727 (2012)

    Google Scholar 

  40. Yuan, P., Sha, C., Wang, X., Yang, B., Zhou, A., Yang, S.: Xml structural similarity search using mapreduce. In: Chen, L., Tang, C., Yang, J., Gao, Y. (eds.) WAIM 2010. LNCS, vol. 6184, pp. 169–181. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  41. Kolb, L., Thor, A., Rahm, E.: Load balancing for mapreduce-based entity resolution. In: 2012 IEEE 28th International Conference on Data Engineering (ICDE), pp. 618–629. IEEE (2012)

    Google Scholar 

  42. Ma, Q., Yang, B., Qian, W., Zhou, A.: Query processing of massive trajectory data based on mapreduce. In: CloudDb, pp. 9–16 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Guo, C., Hedeler, C., Paton, N.W., Fernandes, A.A.A. (2013). EvoMatch: An Evolutionary Algorithm for Inferring Schematic Correspondences. In: Hameurlain, A., Küng, J., Wagner, R. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems XII. Lecture Notes in Computer Science, vol 8320. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-45315-1_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-45315-1_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-45314-4

  • Online ISBN: 978-3-642-45315-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics