Defining and Using Schematic Correspondences for Automatically Generating Schema Mappings

  • Lu Mao
  • Khalid Belhajjame
  • Norman W. Paton
  • Alvaro A. A. Fernandes
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5565)

Abstract

Mapping specification has been recognised as a critical bottleneck to the large scale deployment of data integration systems. A mapping is a description using which data structured under one schema are transformed into data structured under a different schema, and is central to data integration and data exchange systems. In this paper, we argue that the classical approach of correspondence identification followed by (manual) mapping generation can be simplified through the removal of the second step by judicious refinement of the correspondences captured. As a step in this direction, we present in this paper a model for schematic correspondences that builds on and extends the classification proposed by Kim et al. to cater for the automatic derivation of mappings, and present an algorithm that shows how correspondences specified in the model proposed can be used for deriving schema mappings. The approach is illustrated using a case study from integration in proteomics.

Keywords

Schematic correspondences schema mappings mapping generation 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Boyd, M., Kittivoravitkul, S., Lazanitis, C., McBrien, P., Rizopoulos, N.: Automed: A bav data integration system for heterogeneous data sources. In: Persson, A., Stirna, J. (eds.) CAiSE 2004. LNCS, vol. 3084, pp. 82–97. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  2. 2.
    Calvanese, D., De Giacomo, G., Lenzerini, M., Nardi, D., Rosati, R.: Information integration: Conceptual modeling and reasoning support. In: CoopIS, pp. 280–291. IEEE Computer Society Press, Los Alamitos (1998)Google Scholar
  3. 3.
    Codd, E.F.: Extending the database relational model to capture more meaning. ACM Trans. Database Syst. 4(4), 397–434 (1979)CrossRefGoogle Scholar
  4. 4.
    Doan, A., Domingos, P., Halevy, A.Y.: Reconciling schemas of disparate data sources: A machine-learning approach. In: SIGMOD Conference, pp. 509–520 (2001)Google Scholar
  5. 5.
    Doan, A., Halevy, A.Y.: Semantic integration research in the database community: A brief survey. AI Magazine 26(1), 83–94 (2005)Google Scholar
  6. 6.
    Fagin, R., Kolaitis, P.G., Miller, R.J., Popa, L.: Data exchange: semantics and query answering. Theor. Comput. Sci. 336(1), 89–124 (2005)CrossRefMATHGoogle Scholar
  7. 7.
    Hakimpour, F., Geppert, A.: Global schema generation using formal ontologies. In: Spaccapietra, S., March, S.T., Kambayashi, Y. (eds.) ER 2002. LNCS, vol. 2503, pp. 307–321. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  8. 8.
    Halevy, A.Y., Franklin, M.J., Maier, D.: Principles of dataspace systems. In: Vansummeren, S. (ed.) PODS, pp. 1–9. ACM, New York (2006)Google Scholar
  9. 9.
    Halevy, A.Y., Rajaraman, A., Ordille, J.J.: Data integration: The teenage years. In: Dayal, U., Whang, K.-Y., Lomet, D.B., Alonso, G., Lohman, G.M., Kersten, M.L., Cha, S.K., Kim, Y.-K. (eds.) VLDB, pp. 9–16. ACM, New York (2006)Google Scholar
  10. 10.
    Kedad, Z., Bouzeghoub, M.: Discovering view expressions from a multi-source information system. In: CoopIS, pp. 57–68. IEEE Computer Society, Los Alamitos (1999)Google Scholar
  11. 11.
    Kim, W., Choi, I., Gala, S.K., Scheevel, M.: On resolving schematic heterogeneity in multidatabase systems. In: Modern Database Systems, pp. 521–550. ACM Press and Addison-Wesley (1995)Google Scholar
  12. 12.
    Kim, W., Seo, J.: Classifying schematic and data heterogeneity in multidatabase systems. IEEE Computer 24(12), 12–18 (1991)CrossRefGoogle Scholar
  13. 13.
    Lenzerini, M.: Data integration: A theoretical perspective. In: Popa, L. (ed.) PODS, pp. 233–246. ACM, New York (2002)Google Scholar
  14. 14.
    Magnani, M., Rizopoulos, N., McBrien, P., Montesi, D.: Schema integration based on uncertain semantic mappings. In: Delcambre, L.M.L., Kop, C., Mayr, H.C., Mylopoulos, J., Pastor, Ó. (eds.) ER 2005. LNCS, vol. 3716, pp. 31–46. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  15. 15.
    McCann, R., AlShebli, B.K., Le, Q., Nguyen, H., Vu, L., Doan, A.: Mapping maintenance for data integration systems. In: Böhm, K., Jensen, C.S., Haas, L.M., Kersten, M.L., Larson, P.-Å., Ooi, B.C. (eds.) VLDB, pp. 1018–1030. ACM, New York (2005)Google Scholar
  16. 16.
    Pottinger, R., Bernstein, P.A.: Creating a mediated schema based on initial correspondences. IEEE Data Eng. Bull. 25(3), 26–31 (2002)Google Scholar
  17. 17.
    Pottinger, R., Bernstein, P.A.: Merging models based on given correspondences. In: VLDB, pp. 826–873 (2003)Google Scholar
  18. 18.
    Quix, C., Kensche, D., Li, X.: Generic schema merging. In: 9th International Conference on Advanced Information Systems Engineering, pp. 127–141. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  19. 19.
    Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDB J. 10(4), 334–350 (2001)CrossRefMATHGoogle Scholar
  20. 20.
    Yan, L.-L., Miller, R.J., Haas, L.M., Fagin, R.: Data-driven understanding and refinement of schema mappings. In: SIGMOD Conference, pp. 485–496 (2001)Google Scholar
  21. 21.
    Zamboulis, L., Fan, H., Belhajjame, K., Siepen, J.A., Jones, A.C., Martin, N.J., Poulovassilis, A., Hubbard, S.J., Embury, S.M., Paton, N.W.: Data access and integration in the ispider proteomics grid. In: Leser, U., Naumann, F., Eckman, B. (eds.) DILS 2006. LNCS (LNBI), vol. 4075, pp. 3–18. Springer, Heidelberg (2006)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Lu Mao
    • 1
  • Khalid Belhajjame
    • 1
  • Norman W. Paton
    • 1
  • Alvaro A. A. Fernandes
    • 1
  1. 1.School of Computer ScienceUniversity of ManchesterManchesterUK

Personalised recommendations