The VLDB Journal

, Volume 21, Issue 2, pp 191–211 | Cite as

MapMerge: correlating independent schema mappings

  • Bogdan Alexe
  • Mauricio Hernández
  • Lucian Popa
  • Wang-Chiew Tan
Special Issue Paper

Abstract

One of the main steps toward integration or exchange of data is to design the mappings that describe the (often complex) relationships between the source schemas or formats and the desired target schema. In this paper, we introduce a new operator, called MapMerge, that can be used to correlate multiple, independently designed schema mappings of smaller scope into larger schema mappings. This allows a more modular construction of complex mappings from various types of smaller mappings such as schema correspondences produced by a schema matcher or pre-existing mappings that were designed by either a human user or via mapping tools. In particular, the new operator also enables a new “divide-and-merge” paradigm for mapping creation, where the design is divided (on purpose) into smaller components that are easier to create and understand and where MapMerge is used to automatically generate a meaningful overall mapping. We describe our MapMerge algorithm and demonstrate the feasibility of our implementation on several real and synthetic mapping scenarios. In our experiments, we make use of a novel similarity measure between two database instances with different schemas that quantifies the preservation of data associations. We show experimentally that MapMerge improves the quality of the schema mappings, by significantly increasing the similarity between the input source instance and the generated target instance. Finally, we provide a new algorithm that combines MapMerge with schema mapping composition to correlate flows of schema mappings.

Keywords

Schema mappings Data exchange Data integration 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Alexe, B., Gubanov, M., Hernández, M.A., Ho, H., Huang, J.W., Katsis, Y., Popa, L., Saha, B., Stanoi, I.: Simplifying information integration: object-based flow-of-mappings framework for integration. In: BIRTE, pp. 108–121. Springer, Berlin (2009)Google Scholar
  2. 2.
    Alexe B., Hernández M.A., Popa L., Tan W.C.: MapMerge: correlating independent schema mappings. PVLDB 3(1), 81–92 (2010)Google Scholar
  3. 3.
    Beeri C., Vardi M.Y.: A proof procedure for data dependencies. JACM 31(4), 718–741 (1984)MathSciNetMATHCrossRefGoogle Scholar
  4. 4.
    Bonifati, A., Chang, E.Q., Ho, T., Lakshmanan, V.S., Pottinger, R.: HePToX: marrying XML and heterogeneity in your P2P databases. In: VLDB, pp. 1267–1270 (2005). http://www.vldb.org/conf/2005/papers/p1267-bonifati.pdf
  5. 5.
    Dessloch, S., Hernández, M.A., Wisnesky, R., Radwan, A., Zhou, J.: Orchid: integrating schema mapping and ETL. In: ICDE, pp. 1307–1316 (2008). http://doi.ieeecomputersociety.org/10.1109/ICDE.2008.4497540
  6. 6.
    Eiter T., Mannila H.: Distance measures for point sets and their computation. Acta Inform. 34(2), 109–133 (1997)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Fagin R., Haas L.M., Hernández M.A., Miller R.J., Popa L., Velegrakis Y.: Clio: schema mapping creation and data exchange. In: Borgida, A., Chaudhri, V.K., Giorgini, P., Yu, E.S.K. (eds) Conceptual Modeling: Foundations and Applications, pp. 198–236. Springer, Berlin (2009)CrossRefGoogle Scholar
  8. 8.
    Fagin R., Kolaitis P.G., Miller R.J., Popa L.: Data exchange: semantics and query answering. TCS 336(1), 89–124 (2005)MathSciNetMATHCrossRefGoogle Scholar
  9. 9.
    Fagin R., Kolaitis P.G., Popa L., Tan W.: Composing schema mappings: second-order dependencies to the rescue. TODS 30(4), 994–1055 (2005)CrossRefGoogle Scholar
  10. 10.
    Fagin, R., Kolaitis, P.G., Popa, L., Tan, W.C.: Reverse data exchange: coping with nulls. In: PODS, pp. 23–32 (2009). http://doi.acm.org/10.1145/1559795.1559800
  11. 11.
    Fuxman, A., Hernández, M.A., Ho, C.T.H., Miller, R.J., Papotti, P., Popa, L.: Nested mappings: schema mapping reloaded. In: VLDB, pp. 67–78 (2006). http://www.vldb.org/conf/2006/p67-fuxman.pdf
  12. 12.
    Galindo-Legaria, C.A.: Outerjoins as disjunctions. In: SIGMOD Conference, pp. 348–358 (1994)Google Scholar
  13. 13.
    Kolaitis, P.G.: Schema mappings, data exchange, and metadata management. In: PODS, pp. 61–75 (2005). http://doi.acm.org/10.1145/1065167.1065176
  14. 14.
    Lenzerini, M.: Data integration: a theoretical perspective. In: PODS, pp. 233–246 (2002). http://doi.acm.org/10.1145/543613.543644, http://www.acm.org/sigs/sigmod/pods/proc02/papers/233-Lenzerini.pdf
  15. 15.
    Madhavan, J., Halevy, A.Y.: Composing mappings among data sources. In: VLDB, pp. 572–583 (2003). http://www.vldb.org/conf/2003/papers/S18P01.pdf
  16. 16.
    Maier D., Mendelzon A.O., Sagiv Y.: Testing implications of data dependencies. TODS 4(4), 455–469 (1979)CrossRefGoogle Scholar
  17. 17.
    Melnik, S., Bernstein, P.A., Halevy, A.Y., Rahm, E.: Supporting executable mappings in model management. In: SIGMOD, pp. 167–178 (2005). http://doi.acm.org/10.1145/1066157.1066177
  18. 18.
    Nash, A., Bernstein, P.A., Melnik, S.: Composition of mappings given by embedded dependencies. ACM Trans. Database Syst. 32(1), 4 (2007). http://doi.acm.org/10.1145/1206049.1206053 Google Scholar
  19. 19.
    Popa, L., Velegrakis, Y., Miller, R.J., Hernández, M.A., Fagin, R.: Translating web data. In: VLDB, pp. 598–609 (2002). http://www.vldb.org/conf/2002/S17P02.pdf
  20. 20.
    Rahm E., Bernstein P.A.: A survey of approaches to automatic schema matching. VLDB J. 10(4), 334–350 (2001)MATHCrossRefGoogle Scholar
  21. 21.
    Rajaraman, A., Ullman, J.D.: Integrating information by outerjoins and full disjunctions. In: PODS, pp. 238–248 (1996). http://doi.acm.org/10.1145/237661.237717
  22. 22.
    Simitsis, A., Vassiliadis, P., Sellis, T.K.: Optimizing ETL processes in data warehouses. In: ICDE, pp. 564–575 (2005). http://doi.ieeecomputersociety.org/10.1109/ICDE.2005.103
  23. 23.
    Vassiliadis, P., Simitsis, A., Skiadopoulos, S.: Modeling ETL activities as graphs. In: DMDW, pp. 52–61 (2002). http://SunSITE.Informatik.RWTH-Aachen.de/Publications/CEUR-WS/Vol-58/simitsis.pdf
  24. 24.
    Velegrakis, Y., Miller, R.J., Popa, L.: Mapping adaptation under evolving schemas. In: VLDB, pp. 584–595 (2003). http://www.vldb.org/conf/2003/papers/S18P02.pdf
  25. 25.
    Yu, C., Popa, L.: Semantic adaptation of schema mappings when schemas evolve. In: VLDB, pp. 1006–1017 (2005). http://www.vldb2005.org/program/paper/fri/p1006-yu.pdf

Copyright information

© Springer-Verlag 2012

Authors and Affiliations

  • Bogdan Alexe
    • 1
  • Mauricio Hernández
    • 1
  • Lucian Popa
    • 1
  • Wang-Chiew Tan
    • 1
    • 2
  1. 1.IBM Research-AlmadenSan JoseUSA
  2. 2.UC Santa CruzSanta CruzUSA

Personalised recommendations