Abstract
On the basis of the tree-regular language theory, we study document transformation and schema transformation. A document is represented by a tree t, and a schema is represented by a tree-regular language L. Document transformation is defined as a composition of a marking function m pc and a linear tree homomorphism h, where P is a pattern and C is a contextual condition. Pattern P is a tree-regular language, and contextual condition C is a pointed tree representation. Marking function m pc marks a node if the subtree rooted by this node matches P and the envelope (the rest of the tree) satisfies C. Linear tree homomorphism h (Gécseg and Steinby [5]) then rewrites the tree, for example, by deleting or renaming marked nodes. Schema transformation is defined by naturally extending document transformation; that is, the result of transforming a schema L, denoted h(m pc (L)), is {h(m pc ()) | t ε L}. Given a tree automaton that accepts L, we can effectively construct a tree automaton that accepts h(m pc (L)). This observation provides a theoretical basis for document transformation engines and document database systems.
Preview
Unable to display preview. Download preview PDF.
References
Arnon, D.: Scrimshaw: A language for document queries and transformations. Electronic Publishing — Origination, Dissemination, and Design 6 (1993) 385–396
Christophidese, V., Abiteboul, S., Cluet, S., Scoll, M.: From structured documents to novel query facilities. In SIGMOD 1994, (1994) 313–324
Colby, L.: An algebra for list-oriented applications. Technical Report TR 347, Indiana University, Bloomington, Indiana 47405–4101, (1992)
Gyssens, M., Paredaens, J., and Van Gucht, D.: A grammar-based approach towards unifying hierarchical data models. SIAM Journal on Computing, 23, (1994) 1093–1097
Gécseg, F., and Steinby, M.: Tree automata. Akadémiai Kiaddá, Budapest, Hungary, 1984.
Hoffmann, C., and O'Donnell, M.: Pattern matching in trees. Journal of the ACM. 29(1): (1982) 68–95
International Organization for Standardization. Information Processing — Text and Office Systems — Standard Generalized Markup Language (SGML), 1986.
International Organization for Standardization. Information Technology — Text and Office Systems — Hypermedia/Time-based Structuring Language (HyTime), 1992.
International Organization for Standardization. Information Technology — Text and Office Systems — Document Style Semantics and Specification Language (DSSSL), 1994.
Loeffen, A.: Text databases: a survey of text models and systems. SIGMOD Record, 23(1): (1994) 97–106
Nivat, M. and Podelski, A.: Another variation on the common subexpression problem. Theoretical Computer Science, 114, (1993) 11–11
Podelski, A.: A monoid approach to tree automata. In Nivat and Podelski, editors, Tree Automata and Languages, Studies in Computer Science and Artificial Intelligence 10. North-Holland, (1992) 11–11
Wilhelm, R.: Tree transformations, functional languages, and attribute grammars. In Pierre Deransart and Martin Jourdan, editors, Attribute grammars and their applications, Springer-Verlag 461, (1990) 116–129
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1997 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Murata, M. (1997). Transformation of documents and schemas by patterns and contextual conditions. In: Nicholas, C., Wood, D. (eds) Principles of Document Processing. PODP 1996. Lecture Notes in Computer Science, vol 1293. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-63620-X_61
Download citation
DOI: https://doi.org/10.1007/3-540-63620-X_61
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63620-5
Online ISBN: 978-3-540-69614-8
eBook Packages: Springer Book Archive