Skip to main content

Abstract

The Clio project provides tools that vastly simplify information integration. Information integration requires data conversions to bring data in different representations into a common form. Key contributions of Clio are the definition of non-procedural schema mappings to describe the relationship between data in heterogeneous schemas, a new paradigm in which we view the mapping creation process as one of query discovery, and algorithms for automatically generating queries for data transformation from the mappings. Clio provides algorithms to address the needs of two major information integration problems, namely, data integration and data exchange. In this chapter, we present our algorithms for both schema mapping creation via query discovery, and for query generation for data exchange. These algorithms can be used in pure relational, pure XML, nested relational, or mixed relational and nested contexts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abiteboul, S., Bidoit, N.: Non-first Normal Form Relations: An Algebra Allowing Data Restructuring. J. Comput. Syst. Sci. 33, 361–393 (1986)

    Article  MathSciNet  MATH  Google Scholar 

  2. Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley, Reading (1995)

    MATH  Google Scholar 

  3. Alexe, B., Chiticariu, L., Miller, R.J., Tan, W.-C.: Muse: Mapping understanding and design by example. In: Proceedings of International Conference on Data Engineering (ICDE), pp. 10–19 (2008)

    Google Scholar 

  4. Alexe, B., Tan, W.-C., Velegrakis, Y.: STBenchmark: towards a benchmark for mapping systems. In: Proceedings of the VLDB Endowment, vol. 1(1), pp. 230–244 (2008)

    Google Scholar 

  5. An, Y., Borgida, A., Miller, R.J., Mylopoulos, J.: A Semantic Approach to Discovering Schema Mapping Expressions. In: Proceedings of International Conference on Data Engineering (ICDE), pp. 206–215 (2007)

    Google Scholar 

  6. Batini, C., Lenzerini, M., Navathe, S.B.: A Comparative Analysis of Methodologies for Database Schema Integration. ACM Computing Surveys 18(4), 323–364 (1986)

    Article  Google Scholar 

  7. Beeri, C., Vardi, M.Y.: A proof procedure for data dependencies. J. ACM 31(4), 718–741 (1984)

    MathSciNet  MATH  Google Scholar 

  8. Bernstein, P., Halevy, A., Pottinger, R.: A Vision for Management of Complex Models. SIGMOD Record 29(4), 55–63 (2000)

    Article  Google Scholar 

  9. Bernstein, P.A., Haas, L.M.: Information Integration in the Enterprise. Commun. ACM 51(9), 72–79 (2008)

    Article  Google Scholar 

  10. Bernstein, P.A., Melnik, S., Mork, P.: Interactive Schema Translation with Instance-Level Mapping. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), pp. 1283–1286 (2005)

    Google Scholar 

  11. Bohannon, P., Elnahrawy, E., Fan, W., Flaster, M.: Putting Context into Schema Matching. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), pp. 307–318 (2006)

    Google Scholar 

  12. Bohannon, P., Fan, W., Flaster, M., Narayan, P.P.S.: Information Preserving XML Schema Embedding. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), pp. 85–96 (2005)

    Google Scholar 

  13. Bonifati, A., Chang, E.Q., Ho, T., Lakshmanan, V.S., Pottinger, R.: HePToX: Marrying XML and Heterogeneity in Your P2P Databases. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), pp. 1267–1270 (2005)

    Google Scholar 

  14. Bonifati, A., Mecca, G., Pappalardo, A., Raunich, S., Summa, G.: Schema mapping verification: the spicy way. In: International Conference on Extending Database Technology (EDBT), pp. 85–96 (2008)

    Google Scholar 

  15. Bonifati, A., Mecca, G., Pappalardo, A., Raunich, S., Summa, G.: The spicy system: towards a notion of mapping quality. In: ACM SIGMOD Conference, pp. 1289–1294 (2008)

    Google Scholar 

  16. Chawathe, S., GarciaMolina, H., Hammer, J., Ireland, K., Papakonstantinou, Y., Ullman, J., Widom, J.: The TSIMMIS Project: Integration of Heterogeneous Information Sources. In: Proc. of the 100th Anniversary Meeting of the Information Processing Society of Japan (IPSJ), Tokyo, Japan, pp. 7–18 (1994)

    Google Scholar 

  17. Deutsch, A., Tannen, V.: XML queries and constraints, containment and reformulation. Theoretical Comput. Sci. 336(1), 57–87 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  18. Fagin, R.: Inverting schema mappings. ACM Transactions on Database Systems (TODS) 32(4), 25 (2007)

    Google Scholar 

  19. Fagin, R., Kolaitis, P.G., Miller, R.J., Popa, L.: Data Exchange: Semantics and Query Answering. Theoretical Comput. Sci. 336(1), 89–124 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  20. Fagin, R., Kolaitis, P.G., Nash, A., Popa, L.: Towards a theory of schema-mapping optimization. In: Proceedings of the ACM Symposium on Principles of Database Systems (PODS), pp. 33–42 (2008)

    Google Scholar 

  21. Fagin, R., Kolaitis, P.G., Popa, L., Tan, W.: Composing schema mappings: Second-order dependencies to the rescue. ACM Transactions on Database Systems (TODS) 30(4), 994–1055 (2005)

    Article  Google Scholar 

  22. Fagin, R., Kolaitis, P.G., Popa, L., Tan, W.-C.: Quasi-inverses of schema mappings. ACM Transactions on Database Systems (TODS) 33(2), 1–52 (2008)

    Article  Google Scholar 

  23. Franklin, M.J., Halevy, A.Y., Maier, D.: From databases to dataspaces: a new abstraction for information management. SIGMOD Record 34(4), 27–33 (2005)

    Article  Google Scholar 

  24. Fuxman, A., Hernández, M.A., Ho, H., Miller, R.J., Papotti, P., Popa, L.: Nested Mappings: Schema Mapping Reloaded. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), pp. 67–78 (2006)

    Google Scholar 

  25. Fuxman, A., Kolaitis, P.G., Miller, R., Tan, W.-C.: Peer Data Exchange. ACM Transactions on Database Systems (TODS) 31(4), 1454–1498 (2006)

    Article  Google Scholar 

  26. Haas, L.: Beauty and the beast: The theory and practice of information integration. In: Schwentick, T., Suciu, D. (eds.) ICDT 2007. LNCS, vol. 4353, pp. 28–43. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  27. Haas, L.M., Hernández, M.A., Ho, H., Popa, L., Tork Roth, M.: Clio grows up: From research prototype to industrial tool. In: ACM SIGMOD Conference, pp. 805–810 (2005)

    Google Scholar 

  28. Halevy, A.Y., Ives, Z.G., Madhavan, J., Mork, P., Suciu, D., Tatarinov, I.: The piazza peer data management system. IEEE Transactions On Knowledge and Data Engineering 16(7), 787–798 (2004)

    Article  Google Scholar 

  29. Hernández, M.A., Papotti, P., Tan, W.-C.: Data exchange with data-metadata translations. Proceedings of the VLDB Endowment 1(1), 260–273 (2008)

    Article  Google Scholar 

  30. Hull, R., Yoshikawa, M.: ILOG: Declarative Creation and Manipulation of Object Identifiers. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), pp. 455–468 (1990)

    Google Scholar 

  31. Jiang, H., Ho, H., Popa, L., Han, W.-S.: Mapping-driven XML transformation. In: Proceedings of the International WWW Conference, pp. 1063–1072 (2007)

    Google Scholar 

  32. Jiang, L., Borgida, A., Mylopoulos, J.: Towards a compositional semantic account of data quality attributes. In: Li, Q., Spaccapietra, S., Yu, E., Olivé, A. (eds.) ER 2008. LNCS, vol. 5231, pp. 55–68. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  33. Lenzerini, M.: Data Integration: A Theoretical Perspective. In: Proceedings of the ACM Symposium on Principles of Database Systems (PODS), pp. 233–246 (2002)

    Google Scholar 

  34. Levy, A.Y., Rajaraman, A., Ordille, J.J.: Querying Heterogeneous Information Sources Using Source Descriptions. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), pp. 251–262 (1996)

    Google Scholar 

  35. Madhavan, J., Halevy, A.Y.: Composing Mappings Among Data Sources. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), pp. 572–583 (2003)

    Google Scholar 

  36. Maier, D., Mendelzon, A.O., Sagiv, Y.: Testing Implications of Data Dependencies. ACM Transactions on Database Systems (TODS) 4(4), 455–469 (1979)

    Article  Google Scholar 

  37. Melnik, S., Bernstein, P.A., Halevy, A., Rahm, E.: Applying model management to executable mappings. In: ACM SIGMOD Conference, pp. 167–178 (2005)

    Google Scholar 

  38. Miller, R.J., Haas, L.M., Hernández, M.: Schema Mapping as Query Discovery. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), pp. 77–88 (2000)

    Google Scholar 

  39. Milo, T., Zohar, S.: Using Schema Matching to Simplify Heterogeneous Data Translation. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), pp. 122–133 (1998)

    Google Scholar 

  40. Nash, A., Bernstein, P.A., Melnik, S.: Composition of mappings given by embedded dependencies. In: Proceedings of the ACM Symposium on Principles of Database Systems (PODS), pp. 172–183 (2005)

    Google Scholar 

  41. Papakonstantinou, Y., Abiteboul, S., Garcia-Molina, H.: Object fusion in mediator systems. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), pp. 413–424 (1996)

    Google Scholar 

  42. Popa, L., Tannen, V.: An Equational Chase for Path-Conjunctive Queries, Constraints, and Views. In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 39–57. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  43. Popa, L., Velegrakis, Y., Miller, R.J., Hernández, M.A., Fagin, R.: Translating Web Data. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), pp. 598–609 (2002)

    Google Scholar 

  44. Raffio, A., Braga, D., Ceri, S., Papotti, P., Hernández, M.A.: Clip: a Visual Language for Explicit Schema Mappings. In: Proceedings of International Conference on Data Engineering (ICDE), pp. 30–39 (2008)

    Google Scholar 

  45. Salles, M.A.V., Dittrich, J.-P., Karakashian, S.K., Girard, O.R., Blunschi, L.: iTrails: Pay-as-you-go information integration in dataspaces. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), pp. 663–674 (2007)

    Google Scholar 

  46. Shu, N.C., Housel, B.C., Lum, V.Y.: Convert: A high level translation definition language for data conversion. Commun. ACM 18(10), 557–567 (1975)

    Article  MATH  Google Scholar 

  47. Shu, N.C., Housel, B.C., Taylor, R.W., Ghosh, S.P., Lum, V.Y.: EXPRESS: A Data EXtraction, Processing and REstructuring System. ACM Transactions on Database Systems (TODS) 2(2), 134–174 (1977)

    Article  Google Scholar 

  48. Velegrakis, Y.: Managing Schema Mappings in Highly Heterogeneous Environments. PhD thesis, Department of Computer Science, University of Toronto (2004)

    Google Scholar 

  49. Velegrakis, Y., Miller, R.J., Popa, L.: On Preserving Mapping Consistency under Schema Changes. International Journal on Very Large Data Bases 13(3), 274–293 (2004)

    Google Scholar 

  50. Wyss, C.M., Robertson, E.L.: Relational languages for metadata integration. ACM Transactions on Database Systems (TODS) 30(2), 624–660 (2005)

    Article  Google Scholar 

  51. Yan, L.-L., Miller, R.J., Haas, L., Fagin, R.: Data-Driven Understanding and Refinement of Schema Mappings. ACM SIGMOD Conference 30(2), 485–496 (2001)

    Article  Google Scholar 

  52. Yu, C., Popa, L.: Constraint-Based XML Query Rewriting For Data Integration. ACM SIGMOD Conference 33(2), 371–382 (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Fagin, R., Haas, L.M., Hernández, M., Miller, R.J., Popa, L., Velegrakis, Y. (2009). Clio: Schema Mapping Creation and Data Exchange. In: Borgida, A.T., Chaudhri, V.K., Giorgini, P., Yu, E.S. (eds) Conceptual Modeling: Foundations and Applications. Lecture Notes in Computer Science, vol 5600. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02463-4_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-02463-4_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-02462-7

  • Online ISBN: 978-3-642-02463-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics