Data Mapping as Search

  • George H. L. Fletcher
  • Catharine M. Wyss
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3896)

Abstract

In this paper, we describe and situate the tupelo system for data mapping in relational databases. Automating the discovery of mappings between structured data sources is a long standing and important problem in data management. Starting from user provided example instances of the source and target schemas, tupeloapproaches mapping discovery as search within the transformation space of these instances based on a set of mapping operators. tupelomapping expressions incorporate not only data-metadata transformations, but also simple and complex semantic transformations, resulting in significantly wider applicability than previous systems. Extensive empirical validation of tupelo, both on synthetic and real world datasets, indicates that the approach is both viable and effective.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bernstein, P.A., et al.: Interactive Schema Translation with Instance-Level Mappings (System Demo). In: Proc. VLDB Conf., Trondheim, Norway, pp. 1283–1286 (2005)Google Scholar
  2. 2.
    Bilke, A., Naumann, F.: Schema Matching using Duplicates. In: Proc. IEEE ICDE, Tokyo, Japan, pp. 69–80 (2005)Google Scholar
  3. 3.
    Bossung, S., et al.: Automated Data Mapping Specification via Schema Heuristics and User Interaction. In: Proc. IEEE/ACM ASE, Linz, Austria, pp. 208–217 (2004)Google Scholar
  4. 4.
    Carreira, P., Galhardas, H.: Execution of Data Mappers. In: Proc. ACM SIGMOD Workshop IQIS, Paris, France, pp. 2–9 (2004)Google Scholar
  5. 5.
    Chang, K.C.-C., He, B., Li, C., Patel, M., Zhang, Z.: Structured Databases on the Web: Observations and Implications. SIGMOD Record 33(3), 61–70 (2004)CrossRefGoogle Scholar
  6. 6.
    Dhamankar, R., et al.: iMAP: Discovering Complex Semantic Matches between Database Schemas. In: Proc. ACM SIGMOD, Paris, France, pp. 383–394 (2004)Google Scholar
  7. 7.
    Doan, A., Domingos, P., Halevy, A.: Learning to Match the Schemas of Databases: A Multistrategy Approach. Machine Learning 50(3), 279–301 (2003)MATHCrossRefGoogle Scholar
  8. 8.
    Doan, A., Noy, N., Halevy, A.: Special Section on Semantic Integration. SIGMOD Record 33(4) (2004)Google Scholar
  9. 9.
    Embley, D.W., Xu, L., Ding, Y.: Automatic Direct and Indirect Schema Mapping: Experiences and Lessons Learned, vol. 8, pp. 14–19Google Scholar
  10. 10.
    Euzenat, J., et al.: State of the Art on Ontology Alignment. In: Tech. Report D2.2.3, IST Knowledge Web NoE (2004)Google Scholar
  11. 11.
    Fletcher, G.H.L., Wyss, C.M.: Mapping Between Data Sources on the Web. In: Proc. IEEE ICDE Workshop WIRI, Tokyo, Japan (2005)Google Scholar
  12. 12.
    Fletcher, G.H.L., et al.: A Calculus for Data Mapping. In: Proc. COORDINATION Workshop InterDB, Namur, Belgium (2005)Google Scholar
  13. 13.
    Gottlob, G., et al.: The Lixto Data Extraction Project – Back and Forth between Theory and Practice. In: Proc. ACM PODS, Paris, France, pp. 1–12 (2004)Google Scholar
  14. 14.
    He, B., et al.: Discovering Complex Matchings Across Web Query Interfaces: A Correlation Mining Approach. In: Proc. ACM KDD (2004)Google Scholar
  15. 15.
    Ives, Z.G., Halevy, A.Y., Mork, P., Tatarinov, I.: Piazza: Mediation and Integration Infrastructure for Semantic Web Data. J. Web Sem 1(2), 155–175 (2004)Google Scholar
  16. 16.
    Kang, J., Naughton, J.F.: On Schema Matching with Opaque Column Names and Data Values. In: Proc. ACM SIGMOD, San Diego, CA, pp. 205–216 (2003)Google Scholar
  17. 17.
    Kolaitis, P.G.: Schema Mappings, Data Exchange, and Metadata Management. In: Proc. ACM PODS, Baltimore, MD, USA, pp. 61–75 (2005)Google Scholar
  18. 18.
    Krishnamurthy, R., et al.: Language Features for Interoperability of Databases with Schematic Discrepancies. In: Proc. ACM SIGMOD, Denver, CO, USA, pp. 40–49 (1991)Google Scholar
  19. 19.
    Lenzerini, M.: Data Integration: A Theoretical Perspective. In: Proc. ACM PODS, Madison, WI, pp. 233–246 (2002)Google Scholar
  20. 20.
    Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Doklady Akademii Nauk SSSR 163(4), 845–848 (1965)MathSciNetGoogle Scholar
  21. 21.
    Levy, A.Y., Ordille, J.J.: An Experiment in Integrating Internet Information Sources. In: Proc. AAAI Fall Symp. AI Apps. Knowl. Nav. Ret., Cambridge, MA, USA, pp. 92–96 (1995)Google Scholar
  22. 22.
    Li, W.-S., Clifton, C.: SEMINT: A Tool for Identifying Attribute Correspondences in Heterogeneous Databases Using Neural Networks. Data Knowl. Eng 33(1), 49–84 (2000)MATHCrossRefGoogle Scholar
  23. 23.
    Litwin, W., Ketabchi, M.A., Krishnamurthy, R.: First Order Normal Form for Relational Databases and Multidatabases. SIGMOD Record 20(4), 74–76 (1991)CrossRefGoogle Scholar
  24. 24.
    Melnik, S.: Generic Model Management. LNCS, vol. 2967. Springer, Heidelberg (2004)MATHCrossRefGoogle Scholar
  25. 25.
    Melnik, S., et al.: Supporting Executable Mappings in Model Management. In: Proc. ACM SIGMOD, Baltimore, MD, USA (2005)Google Scholar
  26. 26.
    Miller, R.J., Haas, L.M., Hernández, M.A.: Schema Mapping as Query Discovery. In: Proc. VLDB Conf., Cairo, Egypt, pp. 77–88 (2000)Google Scholar
  27. 27.
    Morishima, A., et al.: A Machine Learning Approach to Rapid Development of XML Mapping Queries. In: Proc. IEEE ICDE, Boston, MA, USA, pp. 276–287 (2004)Google Scholar
  28. 28.
    Nilsson, N.J.: Artificial Intelligence: A New Synthesis. Morgan Kaufmann, San Francisco (1998)MATHGoogle Scholar
  29. 29.
    Noy, N.F., Doan, A., Halevy, A.Y.: Special Issue on Semantic Integration. AI Magazine 26(1) (2005)Google Scholar
  30. 30.
    Perkowitz, M., Etzioni, O.: Category Translation: Learning to Understand Information on the Internet. In: Proc. IJCAI, Montréal, Canada, pp. 930–938 (1995)Google Scholar
  31. 31.
    Rahm, E., Bernstein, P.A.: A Survey of Approaches to Automatic Schema Matching. VLDB J. 10(4), 334–350 (2001)MATHCrossRefGoogle Scholar
  32. 32.
    Raman, V., Hellerstein, J.M.: Potter’sWheel: An Interactive Data Cleaning System. In: Proc. VLDB Conf., Roma, Italy, pp. 381–390 (2001)Google Scholar
  33. 33.
    Schmid, U., Waltermann, J.: Automatic Synthesis of XSL-Transformations from Example Documents. In: Proc. IASTED AIA, Innsbruck, Austria (2004)Google Scholar
  34. 34.
    Shvaiko, P., Euzenat, J.: A Survey of Schema-Based Matching Approaches. In: J. Data Semantics IV (2005)(to appear)Google Scholar
  35. 35.
    Smiljanić, M., van Keulen, M., Jonker, W.: Formalizing the XML schema matching problem as a constraint optimization problem. In: Andersen, K.V., Debenham, J., Wagner, R. (eds.) DEXA 2005. LNCS, vol. 3588, pp. 333–342. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  36. 36.
    Stephens, D.R.: Information Retrieval and Computational Geometry. Dr. Dobb’s Journal 29(12), 42–45 (2004)Google Scholar
  37. 37.
    Wang, G., Goguen, J.A., Nam, Y.-K., Lin, K.: Critical points for interactive schema matching. In: Yu, J.X., Lin, X., Lu, H., Zhang, Y. (eds.) APWeb 2004. LNCS, vol. 3007, pp. 654–664. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  38. 38.
    Winkler, W.E.: The State of Record Linkage and Current Research Problems. U.S. Bureau of the Census, Statistical Research Division, Technical Report RR99/04 (1999)Google Scholar
  39. 39.
    Wyss, C.M., Robertson, E.L.: Relational Languages for Metadata Integration. ACM TODS 30(2), 624–660 (2005)CrossRefGoogle Scholar
  40. 40.
    Wyss, C.M., Edward, L.: A Formal Characterization of PIVOT / UNPIVOT. In: Proc. ACM CIKM, Bremen, Germany (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • George H. L. Fletcher
    • 1
  • Catharine M. Wyss
    • 1
  1. 1.Computer Science Department, School of InformaticsIndiana UniversityBloomingtonUSA

Personalised recommendations