Skip to main content

GODDAG: A Data Structure for Overlapping Hierarchies

  • Conference paper

Part of the Lecture Notes in Computer Science book series (LNCS,volume 2023)

Abstract

Notations like SGML and XML represent document structures using tree structures; while this is in general a step forward from earlier systems, it creates certain difficulties for the representation of documents in which the structures of interest are not properly nested. Overlapping structures, discontinuous structures, and material which occurs in different orders in different parts, views, or versions of a document are all problems for SGML and XML. Overlapping structures have received attention from a variety of authors on SGML and XML, who have proposed various solutions including the use of non-SGML notations with translation into SGML for processing, the use of the concur feature of SGML, exploitation of conditional marked sections in the DTD and document instance, the imposition of various kinds of unusual interpretations on SGML/XML elements as milestones or as fragments of some larger ‘virtual’ element, or the use of detailed annotation separate from the base text being annotated.

An alternative is the use of a non-SGML/XML notation which does not require that elements form a hierarchical structure. One such notation, MECS, was developed by one of the authors and has been used in practice for over a decade. Unfortunately, the element structure of a MECS document cannot conveniently be represented as a tree, so that MECS processors lack the assistance provided to SGML/XML processors by the unifying assumption of a simple standard data structure for the document. We propose a data structure for representing documents with overlapping structures (including MECS documents). As in the conventional tree representation of SGML and XML, elements are represented by nodes in a graph, and the character data content of the document by labels on the leaves of the graph. We use a directed acyclic graph in which an arc ab indicates that node b is a child of node a. Unlike SGML/XML trees, our graph structure allows children to have multiple parents. In the general form of the data structure, an ordering is imposed on the children of each node; this gives the data structure its name: general ordered-descendant directed acyclic graph (GODDAG). A restricted form of GODDAG, in which an ordering is imposed on the leaves of the graph, cannot handle multiple orderings of the same material but can represent any legal MECS document.

The data structure here proposed should be useful in the representation of naturally occurring documents with complex structures; it may also be useful in other applications.

Keywords

  • Leaf Node
  • Element Node
  • Markup Language
  • Open Element
  • Metrical Line

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-540-39916-2_12
  • Chapter length: 22 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   74.99
Price excludes VAT (USA)
  • ISBN: 978-3-540-39916-2
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   99.00
Price excludes VAT (USA)

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Association for Computers and the Humanities (ACH), Association for Computational Linguistics (ACL) and Association for Literary and Linguistic Computing (ALLC) 1994. In: Sperberg-McQueen, C.M., Burnard, L. (ed.) Guidelines for Electronic Text Encoding and Interchange (TEI P3). TEI, Chicago (1994)

    Google Scholar 

  2. Barnard, D., Hayter, R., Karababa, M., Logan, G., McFadden, J.: SGML-based markup for literary texts: Two problems and some solutions. Computers and the Humanities 22, 265–276 (1988)

    CrossRef  Google Scholar 

  3. Barnard, D.T., Burnard, L., Gaspart, J.-P., Price, L.A., Sperberg-McQueen, C.M., Varile, G.B.: Hierarchical encoding of text: Technical problems and SGML solutions. Computers and the Humanities 29, 211–231 (1995)

    CrossRef  Google Scholar 

  4. Tim, B., Paoli, J., Sperberg-McQueen, C.M. (eds.): Extensible Markup Language (XML) 1.0 [Cambridge, Mass., Sophia-Antipolis, Tokyo]: World Wide Web Consortium (1998)

    Google Scholar 

  5. Goldfarb, C.F.: The SGML Handbook. Clarendon Press, Oxford (1990)

    Google Scholar 

  6. International Organization for Standardization (ISO). ISO 8879: Information processing — Text and office systems — Standard Generalized Markup Language (SGML), [Geneva]: ISO (1986)

    Google Scholar 

  7. McKelvie, D., Brew, C., Thompson, H.S.: Using SGML as a basis for data-intensive natural language processing. Computers and the Humanities 31, 367–388 (1998)

    CrossRef  Google Scholar 

  8. Murata, M.: File format for documents containing both logical structures and layout structures. Electronic publishing 8, 295–317 (1995)

    Google Scholar 

  9. Sperberg-McQueen, C.M., Huitfeldt, C.: Concurrent document hierarchies in MECS and SGML. Literary & Linguistic Computing 14(1), 29–42 (1999)

    CrossRef  Google Scholar 

  10. Sperberg-McQueen, C.M., Huitfeldt, C., Renear, A.: Meaning and Interpretation of Markup. Markup Languages Theory & Practice. Paper originally presented at ALLC/ACH, Glasgow, and at Extreme Markup Languages, Montréal (2000) [forthcoming]

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sperberg-McQueen, C.M., Huitfeldt, C. (2004). GODDAG: A Data Structure for Overlapping Hierarchies. In: King, P., Munson, E.V. (eds) Digital Documents: Systems and Principles. PODDP 2000. Lecture Notes in Computer Science, vol 2023. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39916-2_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-39916-2_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-21070-2

  • Online ISBN: 978-3-540-39916-2

  • eBook Packages: Springer Book Archive