Skip to main content

Encoding Syntactic Annotation

  • Chapter
Treebanks

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 20))

Abstract

There is a widely recognized need for a general framework for linguistic annotation that is flexible and extensible enough to accommodate different annotation types and different theoretical and practical approaches, while at the same time enabling their representation in a “pivot” format that can serve as the basis for comparative evaluation, merging, and the development of reusable editing and processing tools. To address this need, we have developed a framework composed of an abstract model for a variety of different annotation types (e.g., morpho-syntactic tagging, syntactic annotation, coreference annotation, etc.), which can be instantiated in different ways depending on the annotator’s approach and goals. The results have been incorporated into XCES (Ide et al., 2000), the XML instantiation of the Corpus Encoding Standard (Ide 1998a,b), which provides a ready-made, standard encoding format together with a data architecture designed specifically for linguistically annotated corpora.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Basili, R., Pazienza, M. T., Zanzotto, F.M. (1999). Lexicalizing a Shallow Parser. Proceedings TALN’99, Cargèse, Corsica, p. 25–34.

    Google Scholar 

  • Bird, S., Day, D., Garafolo, J., Henderson, J., Laprun, C., Liberman, M. (2000). ATLAS: A Flexible and Extensible Architecture for Linguistic Annotation. Proceedings of the Second Language Resources and Evaluation Conference (LREC), Athens, Greece, p. 1699–1706

    Google Scholar 

  • Böhmová, A., Hajič, J., Hajicová, E., Hladká, B. (2003). The Prague Dependency Treebank: a three-level annotation scenario. This volume.

    Google Scholar 

  • Biron, P., Malhotra, A. (2000). XML Schema Part 2: Datatypes. W3C Candidate Recommendation. http://www.w3.org/TR/xmlschema-2/.

    Google Scholar 

  • Brants, T., Skut, W., Uszkoreit, H. (2003). Syntactic Annotation of a German Newspaper Corpus. This volume.

    Google Scholar 

  • Bray, T., Paoli, J., Sperberg-McQueen, C.M. (eds.) (1998). Extensible Markup Language (XML) Version 1.0. W3C Recommendation. http://www.w3. org:TR/1998/REC-xml-19980210.

    Google Scholar 

  • Brickley, D. Guha, R.V. (2000). Resource Description Framework (RDF) Schema Specification 1.0. W3C Candidate Recommendation, 27 March 2000. http://www.w3.org/TR/rdf-schema/.

    Google Scholar 

  • Carroll, J., Minnen, G., Briscoe, T. (2003). Parser Evaluation Using a Grammatical Relation Annotation Scheme. This volume.

    Google Scholar 

  • Clark, J. (ed.) (1999). XSL Transformations (XSLT). Version 1.0. W3C Recommendation. http://www.w3.org/TR/xslt.

    Google Scholar 

  • Daniel, R., DeRose, S., Maler, E. (2001). XML Pointer Language (XPointer) Version 1.0. W3C Recommendation. http://www.w3.org/TR/xptr.

    Google Scholar 

  • Grefenstette, G. (1999). Shallow Parsing Techniques Applied to Medical Terminology Discovery and Normalization. Proceedings of IMIA WG6, Triennial Conference on Natural Language and Medical Concept Representation. Phoenix, AZ.

    Google Scholar 

  • Harrison, P., Abney, S., Black, E., Flickinger, D., Gdaniec, C., Grishman, R., Hindle, D., Ingria, B., Marcus, M., Santorini, B., Strzalkowski, T. (1991). Evaluating syntax performance of parser/grammars of English. Proceedings of the Workshop on Evaluating Natural Language Processing Systems. 29th Meeting of the Association for Computational Linguistics, Berkeley, CA, p. 71–77.

    Google Scholar 

  • Ide, N. (1998a). Encoding Linguistic Corpora. Proceedings of the Sixth Workshop on Very Large Corpora, 9–17.

    Google Scholar 

  • Ide, N. (1998b). Corpus Encoding Standard: SGML Guidelines for Encoding Linguistic Corpora. Proceedings of the First International Language Resources and Evaluation Conference, p. 463–70.

    Google Scholar 

  • Ide, N., Bonhomme, P., Romary, L. (2000). XCES: An XML-based Standard for Linguistic Corpora. Proceedings of the Second Language Resources and Evaluation Conference (LREC), Athens, Greece, p. 825–30.

    Google Scholar 

  • Järvinen, T. (2003). Bank of English and Beyond: Hand-crafted Parsers for Functional Annotation. This volume.

    Google Scholar 

  • Lassila, O. Swick, R. (1999) Resource Description framework (RDF) Model and Syntax Specification. W3C Recommendation. http://www.w3.org/ TR/REC-rdf-syntax.

    Google Scholar 

  • Leech, G, Barnett, R., Kahrel, P. (1996). EAGLES Recommendations for the Syntactic Annotation of Corpora. EAG-TCWG-SASG/1.8. http://www. ilc.pi.cnr.it/EAGLES/segsasgl/segsasgl.html.

    Google Scholar 

  • Marcus, M., Santorini, B., Marcinkiewicz, M.A. (1993). Building a Large Annotated Corpus of English: The Penn Treebank. Computational Linguistics, 19, 2, p. 313–30.

    Google Scholar 

  • Sleator, D., Temperley, D., (1993). Parsing English with a Link Grammar. Proceedings of the 3rd International Confernece on Parsing Technologies, IWPT’93.

    Google Scholar 

  • Tapanainen, P., Jarvinen, T. (1997). A Non-projective Dependency Parser. Proceedings of ANLP’97, Washington D.C., p. 64–71.

    Google Scholar 

  • Taylor, A., Marcus, M., Santorini, B. (2003). The Penn Treebank: An Overview. This volume.

    Google Scholar 

  • Thompson, H., Beech, D., Maloney, M. Mendelsohn, N. (eds.) (2000). XML Schema Part 1: Structures. W3C Candidate Recommendation, 24 October 2000. http://www.w3.org/TR/xmlschema-1/.

    Google Scholar 

  • Wallis, S. (2003). Completing Parsed Corpora: From Correction to Evolution. This volume.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Ide, N., Romary, L. (2003). Encoding Syntactic Annotation. In: Abeillé, A. (eds) Treebanks. Text, Speech and Language Technology, vol 20. Springer, Dordrecht. https://doi.org/10.1007/978-94-010-0201-1_16

Download citation

  • DOI: https://doi.org/10.1007/978-94-010-0201-1_16

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-1-4020-1335-5

  • Online ISBN: 978-94-010-0201-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics