Schemas for Integration and Translation of Structured and Semi-structured Data
With the emergence of the Web as a universal data repository, research has recently focused on data integration and data translation, and a common data model of semistructured data has been established. It is being realized, however, that having a common schema model is also necessary, to support tasks such as query formulation, decomposition and optimization, or declarative specification of data translation. In this paper we elaborate on the theoretical foundations of a middle-ware schema model. We present expressive and flexible schema definition languages, and investigate properties such as expressive power and the complexity of decision problems that are significant in the context of data translation and integration.
KeywordsRegular Expression Data Graph Expressive Power Parse Tree Virtual Node
Unable to display preview. Download preview PDF.
- Extensible markup language, 1998. Available by from http://www.w3.org/XML/.
- S. Abiteboul, S. Cluet, and T. Milo. Correspondence and translation for heterogeneous data. In Proc. ICDT 97, pages 351–363, 1997.Google Scholar
- S. Abiteboul, D. Quass, J. McHugh, J. Widom, and J.L. Wiener. The lorel query language for semistructured data. Journal on Digital Libraries, 1(1), 1997.Google Scholar
- S. Abiteboul and V. Vianu. Regular path queries with constraints. In Proc. Symp. on Principles of Database Systems-PODS 97, 1997.Google Scholar
- P. Buneman, S. Davidson, M. Fernandez, and D. Suciu. Adding structure to unstructured data. In Proc. Int. Conf. on Database Theory ICDT 97, 1997.Google Scholar
- P. Buneman, S. Davidson, G. Hillebrand, and D. Suciu. A query language and optimization techniques for unstructured data. In Proceedings of SIGMOD’ 96, pages 505–516, 1996.Google Scholar
- P. Buneman, W. Fan, and S. Weinstein. Path constraints on semistructured and structured data. In Proceedings of PODS’ 98, pages 129–138, 1998.Google Scholar
- M.J. Carey et al. Towards heterogeneous multimedia information systems: The Garlic approach. Technical Report RJ 9911, IBM Almaden Research Center, 1994.Google Scholar
- T.-P. Chang and R. Hull. Using witness generators to support bi-directional update between object-based databases. In Proc. Symp. on Principles of Database Systems-PODS 95, San Jose, California, May 1995.Google Scholar
- V. Christophides, S. Abiteboul, S. Cluet, and M. Scholl. From structured documents to novel query facilities. In Proc. ACM SIGMOD Symp. on the Management of Data, 94, pages 313–324, 1994.Google Scholar
- S. Cluet, C. Delobel, J. Simeon, and K. Smaga. Your mediators need data conversion! In SIGMOD’98, to appear, 1998.Google Scholar
- H. Garcia-Molina, Y. Papakonstantinou, D. Quass, A. Rajaraman, Y. Sagiv, J. Ullman, V. Vassalos, and J. Widom. The tsimmis approach to mediation: Data models and languages. In Journal of Intelligent Information Systems, 1997.Google Scholar
- S. Ginsburg. The Mathematical Theory of Context-Free Languages. McGraw-Hill, 1966.Google Scholar
- C.F. Goldfarb. The SGML Handbook. Calendon Press, Oxford, 1990.Google Scholar
- R. Goldman and J. Widom. Dataguides: Enabling query formulation and optimization in semistructured databases. In Proceedings of Conf. on Very Large Data Bases, VLDB’ 97, 1997.Google Scholar
- A. Levy, A. Rajaraman, and J. Ordille. Querying heterogeneous information sources using source descriptions. In Proceedings of Conf. on Very Large Data Bases, VLDB’ 96, 1996.Google Scholar
- A. Mendelzon, G. Michaila, and T. Milo. Querying the world wide web. Int. Journal of Digital Libraries, 1(1), 1997.Google Scholar
- T. Milo and S. Zohar. Using schema matching to simplify heterogeneous data translation. In To appear in VLDB’ 98, 1998.Google Scholar
- Y. Papakonstantinou, H. Garcia-Molina, and J. Widom. Object exchange across heterogeneous information sources. In Proc. IEEE International Conference on Data Engineering 95, 1995.Google Scholar