The VLDB Journal

, Volume 21, Issue 2, pp 191–211 | Cite as

MapMerge: correlating independent schema mappings

  • Bogdan Alexe
  • Mauricio Hernández
  • Lucian Popa
  • Wang-Chiew Tan
Special Issue Paper


One of the main steps toward integration or exchange of data is to design the mappings that describe the (often complex) relationships between the source schemas or formats and the desired target schema. In this paper, we introduce a new operator, called MapMerge, that can be used to correlate multiple, independently designed schema mappings of smaller scope into larger schema mappings. This allows a more modular construction of complex mappings from various types of smaller mappings such as schema correspondences produced by a schema matcher or pre-existing mappings that were designed by either a human user or via mapping tools. In particular, the new operator also enables a new “divide-and-merge” paradigm for mapping creation, where the design is divided (on purpose) into smaller components that are easier to create and understand and where MapMerge is used to automatically generate a meaningful overall mapping. We describe our MapMerge algorithm and demonstrate the feasibility of our implementation on several real and synthetic mapping scenarios. In our experiments, we make use of a novel similarity measure between two database instances with different schemas that quantifies the preservation of data associations. We show experimentally that MapMerge improves the quality of the schema mappings, by significantly increasing the similarity between the input source instance and the generated target instance. Finally, we provide a new algorithm that combines MapMerge with schema mapping composition to correlate flows of schema mappings.


Schema mappings Data exchange Data integration 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Alexe, B., Gubanov, M., Hernández, M.A., Ho, H., Huang, J.W., Katsis, Y., Popa, L., Saha, B., Stanoi, I.: Simplifying information integration: object-based flow-of-mappings framework for integration. In: BIRTE, pp. 108–121. Springer, Berlin (2009)Google Scholar
  2. 2.
    Alexe B., Hernández M.A., Popa L., Tan W.C.: MapMerge: correlating independent schema mappings. PVLDB 3(1), 81–92 (2010)Google Scholar
  3. 3.
    Beeri C., Vardi M.Y.: A proof procedure for data dependencies. JACM 31(4), 718–741 (1984)MathSciNetMATHCrossRefGoogle Scholar
  4. 4.
    Bonifati, A., Chang, E.Q., Ho, T., Lakshmanan, V.S., Pottinger, R.: HePToX: marrying XML and heterogeneity in your P2P databases. In: VLDB, pp. 1267–1270 (2005).
  5. 5.
    Dessloch, S., Hernández, M.A., Wisnesky, R., Radwan, A., Zhou, J.: Orchid: integrating schema mapping and ETL. In: ICDE, pp. 1307–1316 (2008).
  6. 6.
    Eiter T., Mannila H.: Distance measures for point sets and their computation. Acta Inform. 34(2), 109–133 (1997)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Fagin R., Haas L.M., Hernández M.A., Miller R.J., Popa L., Velegrakis Y.: Clio: schema mapping creation and data exchange. In: Borgida, A., Chaudhri, V.K., Giorgini, P., Yu, E.S.K. (eds) Conceptual Modeling: Foundations and Applications, pp. 198–236. Springer, Berlin (2009)CrossRefGoogle Scholar
  8. 8.
    Fagin R., Kolaitis P.G., Miller R.J., Popa L.: Data exchange: semantics and query answering. TCS 336(1), 89–124 (2005)MathSciNetMATHCrossRefGoogle Scholar
  9. 9.
    Fagin R., Kolaitis P.G., Popa L., Tan W.: Composing schema mappings: second-order dependencies to the rescue. TODS 30(4), 994–1055 (2005)CrossRefGoogle Scholar
  10. 10.
    Fagin, R., Kolaitis, P.G., Popa, L., Tan, W.C.: Reverse data exchange: coping with nulls. In: PODS, pp. 23–32 (2009).
  11. 11.
    Fuxman, A., Hernández, M.A., Ho, C.T.H., Miller, R.J., Papotti, P., Popa, L.: Nested mappings: schema mapping reloaded. In: VLDB, pp. 67–78 (2006).
  12. 12.
    Galindo-Legaria, C.A.: Outerjoins as disjunctions. In: SIGMOD Conference, pp. 348–358 (1994)Google Scholar
  13. 13.
    Kolaitis, P.G.: Schema mappings, data exchange, and metadata management. In: PODS, pp. 61–75 (2005).
  14. 14.
    Lenzerini, M.: Data integration: a theoretical perspective. In: PODS, pp. 233–246 (2002).,
  15. 15.
    Madhavan, J., Halevy, A.Y.: Composing mappings among data sources. In: VLDB, pp. 572–583 (2003).
  16. 16.
    Maier D., Mendelzon A.O., Sagiv Y.: Testing implications of data dependencies. TODS 4(4), 455–469 (1979)CrossRefGoogle Scholar
  17. 17.
    Melnik, S., Bernstein, P.A., Halevy, A.Y., Rahm, E.: Supporting executable mappings in model management. In: SIGMOD, pp. 167–178 (2005).
  18. 18.
    Nash, A., Bernstein, P.A., Melnik, S.: Composition of mappings given by embedded dependencies. ACM Trans. Database Syst. 32(1), 4 (2007). Google Scholar
  19. 19.
    Popa, L., Velegrakis, Y., Miller, R.J., Hernández, M.A., Fagin, R.: Translating web data. In: VLDB, pp. 598–609 (2002).
  20. 20.
    Rahm E., Bernstein P.A.: A survey of approaches to automatic schema matching. VLDB J. 10(4), 334–350 (2001)MATHCrossRefGoogle Scholar
  21. 21.
    Rajaraman, A., Ullman, J.D.: Integrating information by outerjoins and full disjunctions. In: PODS, pp. 238–248 (1996).
  22. 22.
    Simitsis, A., Vassiliadis, P., Sellis, T.K.: Optimizing ETL processes in data warehouses. In: ICDE, pp. 564–575 (2005).
  23. 23.
    Vassiliadis, P., Simitsis, A., Skiadopoulos, S.: Modeling ETL activities as graphs. In: DMDW, pp. 52–61 (2002).
  24. 24.
    Velegrakis, Y., Miller, R.J., Popa, L.: Mapping adaptation under evolving schemas. In: VLDB, pp. 584–595 (2003).
  25. 25.
    Yu, C., Popa, L.: Semantic adaptation of schema mappings when schemas evolve. In: VLDB, pp. 1006–1017 (2005).

Copyright information

© Springer-Verlag 2012

Authors and Affiliations

  • Bogdan Alexe
    • 1
  • Mauricio Hernández
    • 1
  • Lucian Popa
    • 1
  • Wang-Chiew Tan
    • 1
    • 2
  1. 1.IBM Research-AlmadenSan JoseUSA
  2. 2.UC Santa CruzSanta CruzUSA

Personalised recommendations