Advertisement

Clio: Schema Mapping Creation and Data Exchange

  • Ronald Fagin
  • Laura M. Haas
  • Mauricio Hernández
  • Renée J. Miller
  • Lucian Popa
  • Yannis Velegrakis
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5600)

Abstract

The Clio project provides tools that vastly simplify information integration. Information integration requires data conversions to bring data in different representations into a common form. Key contributions of Clio are the definition of non-procedural schema mappings to describe the relationship between data in heterogeneous schemas, a new paradigm in which we view the mapping creation process as one of query discovery, and algorithms for automatically generating queries for data transformation from the mappings. Clio provides algorithms to address the needs of two major information integration problems, namely, data integration and data exchange. In this chapter, we present our algorithms for both schema mapping creation via query discovery, and for query generation for data exchange. These algorithms can be used in pure relational, pure XML, nested relational, or mixed relational and nested contexts.

Keywords

Data Exchange Schema Mapping Target Schema Primary Path Query Graph 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Abiteboul, S., Bidoit, N.: Non-first Normal Form Relations: An Algebra Allowing Data Restructuring. J. Comput. Syst. Sci. 33, 361–393 (1986)MathSciNetCrossRefMATHGoogle Scholar
  2. 2.
    Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley, Reading (1995)MATHGoogle Scholar
  3. 3.
    Alexe, B., Chiticariu, L., Miller, R.J., Tan, W.-C.: Muse: Mapping understanding and design by example. In: Proceedings of International Conference on Data Engineering (ICDE), pp. 10–19 (2008)Google Scholar
  4. 4.
    Alexe, B., Tan, W.-C., Velegrakis, Y.: STBenchmark: towards a benchmark for mapping systems. In: Proceedings of the VLDB Endowment, vol. 1(1), pp. 230–244 (2008)Google Scholar
  5. 5.
    An, Y., Borgida, A., Miller, R.J., Mylopoulos, J.: A Semantic Approach to Discovering Schema Mapping Expressions. In: Proceedings of International Conference on Data Engineering (ICDE), pp. 206–215 (2007)Google Scholar
  6. 6.
    Batini, C., Lenzerini, M., Navathe, S.B.: A Comparative Analysis of Methodologies for Database Schema Integration. ACM Computing Surveys 18(4), 323–364 (1986)CrossRefGoogle Scholar
  7. 7.
    Beeri, C., Vardi, M.Y.: A proof procedure for data dependencies. J. ACM 31(4), 718–741 (1984)MathSciNetMATHGoogle Scholar
  8. 8.
    Bernstein, P., Halevy, A., Pottinger, R.: A Vision for Management of Complex Models. SIGMOD Record 29(4), 55–63 (2000)CrossRefGoogle Scholar
  9. 9.
    Bernstein, P.A., Haas, L.M.: Information Integration in the Enterprise. Commun. ACM 51(9), 72–79 (2008)CrossRefGoogle Scholar
  10. 10.
    Bernstein, P.A., Melnik, S., Mork, P.: Interactive Schema Translation with Instance-Level Mapping. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), pp. 1283–1286 (2005)Google Scholar
  11. 11.
    Bohannon, P., Elnahrawy, E., Fan, W., Flaster, M.: Putting Context into Schema Matching. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), pp. 307–318 (2006)Google Scholar
  12. 12.
    Bohannon, P., Fan, W., Flaster, M., Narayan, P.P.S.: Information Preserving XML Schema Embedding. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), pp. 85–96 (2005)Google Scholar
  13. 13.
    Bonifati, A., Chang, E.Q., Ho, T., Lakshmanan, V.S., Pottinger, R.: HePToX: Marrying XML and Heterogeneity in Your P2P Databases. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), pp. 1267–1270 (2005)Google Scholar
  14. 14.
    Bonifati, A., Mecca, G., Pappalardo, A., Raunich, S., Summa, G.: Schema mapping verification: the spicy way. In: International Conference on Extending Database Technology (EDBT), pp. 85–96 (2008)Google Scholar
  15. 15.
    Bonifati, A., Mecca, G., Pappalardo, A., Raunich, S., Summa, G.: The spicy system: towards a notion of mapping quality. In: ACM SIGMOD Conference, pp. 1289–1294 (2008)Google Scholar
  16. 16.
    Chawathe, S., GarciaMolina, H., Hammer, J., Ireland, K., Papakonstantinou, Y., Ullman, J., Widom, J.: The TSIMMIS Project: Integration of Heterogeneous Information Sources. In: Proc. of the 100th Anniversary Meeting of the Information Processing Society of Japan (IPSJ), Tokyo, Japan, pp. 7–18 (1994)Google Scholar
  17. 17.
    Deutsch, A., Tannen, V.: XML queries and constraints, containment and reformulation. Theoretical Comput. Sci. 336(1), 57–87 (2005)MathSciNetCrossRefMATHGoogle Scholar
  18. 18.
    Fagin, R.: Inverting schema mappings. ACM Transactions on Database Systems (TODS) 32(4), 25 (2007)Google Scholar
  19. 19.
    Fagin, R., Kolaitis, P.G., Miller, R.J., Popa, L.: Data Exchange: Semantics and Query Answering. Theoretical Comput. Sci. 336(1), 89–124 (2005)MathSciNetCrossRefMATHGoogle Scholar
  20. 20.
    Fagin, R., Kolaitis, P.G., Nash, A., Popa, L.: Towards a theory of schema-mapping optimization. In: Proceedings of the ACM Symposium on Principles of Database Systems (PODS), pp. 33–42 (2008)Google Scholar
  21. 21.
    Fagin, R., Kolaitis, P.G., Popa, L., Tan, W.: Composing schema mappings: Second-order dependencies to the rescue. ACM Transactions on Database Systems (TODS) 30(4), 994–1055 (2005)CrossRefGoogle Scholar
  22. 22.
    Fagin, R., Kolaitis, P.G., Popa, L., Tan, W.-C.: Quasi-inverses of schema mappings. ACM Transactions on Database Systems (TODS) 33(2), 1–52 (2008)CrossRefGoogle Scholar
  23. 23.
    Franklin, M.J., Halevy, A.Y., Maier, D.: From databases to dataspaces: a new abstraction for information management. SIGMOD Record 34(4), 27–33 (2005)CrossRefGoogle Scholar
  24. 24.
    Fuxman, A., Hernández, M.A., Ho, H., Miller, R.J., Papotti, P., Popa, L.: Nested Mappings: Schema Mapping Reloaded. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), pp. 67–78 (2006)Google Scholar
  25. 25.
    Fuxman, A., Kolaitis, P.G., Miller, R., Tan, W.-C.: Peer Data Exchange. ACM Transactions on Database Systems (TODS) 31(4), 1454–1498 (2006)CrossRefGoogle Scholar
  26. 26.
    Haas, L.: Beauty and the beast: The theory and practice of information integration. In: Schwentick, T., Suciu, D. (eds.) ICDT 2007. LNCS, vol. 4353, pp. 28–43. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  27. 27.
    Haas, L.M., Hernández, M.A., Ho, H., Popa, L., Tork Roth, M.: Clio grows up: From research prototype to industrial tool. In: ACM SIGMOD Conference, pp. 805–810 (2005)Google Scholar
  28. 28.
    Halevy, A.Y., Ives, Z.G., Madhavan, J., Mork, P., Suciu, D., Tatarinov, I.: The piazza peer data management system. IEEE Transactions On Knowledge and Data Engineering 16(7), 787–798 (2004)CrossRefGoogle Scholar
  29. 29.
    Hernández, M.A., Papotti, P., Tan, W.-C.: Data exchange with data-metadata translations. Proceedings of the VLDB Endowment 1(1), 260–273 (2008)CrossRefGoogle Scholar
  30. 30.
    Hull, R., Yoshikawa, M.: ILOG: Declarative Creation and Manipulation of Object Identifiers. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), pp. 455–468 (1990)Google Scholar
  31. 31.
    Jiang, H., Ho, H., Popa, L., Han, W.-S.: Mapping-driven XML transformation. In: Proceedings of the International WWW Conference, pp. 1063–1072 (2007)Google Scholar
  32. 32.
    Jiang, L., Borgida, A., Mylopoulos, J.: Towards a compositional semantic account of data quality attributes. In: Li, Q., Spaccapietra, S., Yu, E., Olivé, A. (eds.) ER 2008. LNCS, vol. 5231, pp. 55–68. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  33. 33.
    Lenzerini, M.: Data Integration: A Theoretical Perspective. In: Proceedings of the ACM Symposium on Principles of Database Systems (PODS), pp. 233–246 (2002)Google Scholar
  34. 34.
    Levy, A.Y., Rajaraman, A., Ordille, J.J.: Querying Heterogeneous Information Sources Using Source Descriptions. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), pp. 251–262 (1996)Google Scholar
  35. 35.
    Madhavan, J., Halevy, A.Y.: Composing Mappings Among Data Sources. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), pp. 572–583 (2003)Google Scholar
  36. 36.
    Maier, D., Mendelzon, A.O., Sagiv, Y.: Testing Implications of Data Dependencies. ACM Transactions on Database Systems (TODS) 4(4), 455–469 (1979)CrossRefGoogle Scholar
  37. 37.
    Melnik, S., Bernstein, P.A., Halevy, A., Rahm, E.: Applying model management to executable mappings. In: ACM SIGMOD Conference, pp. 167–178 (2005)Google Scholar
  38. 38.
    Miller, R.J., Haas, L.M., Hernández, M.: Schema Mapping as Query Discovery. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), pp. 77–88 (2000)Google Scholar
  39. 39.
    Milo, T., Zohar, S.: Using Schema Matching to Simplify Heterogeneous Data Translation. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), pp. 122–133 (1998)Google Scholar
  40. 40.
    Nash, A., Bernstein, P.A., Melnik, S.: Composition of mappings given by embedded dependencies. In: Proceedings of the ACM Symposium on Principles of Database Systems (PODS), pp. 172–183 (2005)Google Scholar
  41. 41.
    Papakonstantinou, Y., Abiteboul, S., Garcia-Molina, H.: Object fusion in mediator systems. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), pp. 413–424 (1996)Google Scholar
  42. 42.
    Popa, L., Tannen, V.: An Equational Chase for Path-Conjunctive Queries, Constraints, and Views. In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 39–57. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  43. 43.
    Popa, L., Velegrakis, Y., Miller, R.J., Hernández, M.A., Fagin, R.: Translating Web Data. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), pp. 598–609 (2002)Google Scholar
  44. 44.
    Raffio, A., Braga, D., Ceri, S., Papotti, P., Hernández, M.A.: Clip: a Visual Language for Explicit Schema Mappings. In: Proceedings of International Conference on Data Engineering (ICDE), pp. 30–39 (2008)Google Scholar
  45. 45.
    Salles, M.A.V., Dittrich, J.-P., Karakashian, S.K., Girard, O.R., Blunschi, L.: iTrails: Pay-as-you-go information integration in dataspaces. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), pp. 663–674 (2007)Google Scholar
  46. 46.
    Shu, N.C., Housel, B.C., Lum, V.Y.: Convert: A high level translation definition language for data conversion. Commun. ACM 18(10), 557–567 (1975)CrossRefMATHGoogle Scholar
  47. 47.
    Shu, N.C., Housel, B.C., Taylor, R.W., Ghosh, S.P., Lum, V.Y.: EXPRESS: A Data EXtraction, Processing and REstructuring System. ACM Transactions on Database Systems (TODS) 2(2), 134–174 (1977)CrossRefGoogle Scholar
  48. 48.
    Velegrakis, Y.: Managing Schema Mappings in Highly Heterogeneous Environments. PhD thesis, Department of Computer Science, University of Toronto (2004)Google Scholar
  49. 49.
    Velegrakis, Y., Miller, R.J., Popa, L.: On Preserving Mapping Consistency under Schema Changes. International Journal on Very Large Data Bases 13(3), 274–293 (2004)Google Scholar
  50. 50.
    Wyss, C.M., Robertson, E.L.: Relational languages for metadata integration. ACM Transactions on Database Systems (TODS) 30(2), 624–660 (2005)CrossRefGoogle Scholar
  51. 51.
    Yan, L.-L., Miller, R.J., Haas, L., Fagin, R.: Data-Driven Understanding and Refinement of Schema Mappings. ACM SIGMOD Conference 30(2), 485–496 (2001)CrossRefGoogle Scholar
  52. 52.
    Yu, C., Popa, L.: Constraint-Based XML Query Rewriting For Data Integration. ACM SIGMOD Conference 33(2), 371–382 (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Ronald Fagin
    • 1
  • Laura M. Haas
    • 1
  • Mauricio Hernández
    • 1
  • Renée J. Miller
    • 2
  • Lucian Popa
    • 1
  • Yannis Velegrakis
    • 3
  1. 1.IBM Almaden Research CenterSan JoseUSA
  2. 2.University of TorontoTorontoCanada
  3. 3.University of TrentoTrentoItaly

Personalised recommendations