Abstract
There is a widely recognized need for a general framework for linguistic annotation that is flexible and extensible enough to accommodate different annotation types and different theoretical and practical approaches, while at the same time enabling their representation in a “pivot” format that can serve as the basis for comparative evaluation, merging, and the development of reusable editing and processing tools. To address this need, we have developed a framework composed of an abstract model for a variety of different annotation types (e.g., morpho-syntactic tagging, syntactic annotation, coreference annotation, etc.), which can be instantiated in different ways depending on the annotator’s approach and goals. The results have been incorporated into XCES (Ide et al., 2000), the XML instantiation of the Corpus Encoding Standard (Ide 1998a,b), which provides a ready-made, standard encoding format together with a data architecture designed specifically for linguistically annotated corpora.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Basili, R., Pazienza, M. T., Zanzotto, F.M. (1999). Lexicalizing a Shallow Parser. Proceedings TALN’99, Cargèse, Corsica, p. 25–34.
Bird, S., Day, D., Garafolo, J., Henderson, J., Laprun, C., Liberman, M. (2000). ATLAS: A Flexible and Extensible Architecture for Linguistic Annotation. Proceedings of the Second Language Resources and Evaluation Conference (LREC), Athens, Greece, p. 1699–1706
Böhmová, A., Hajič, J., Hajicová, E., Hladká, B. (2003). The Prague Dependency Treebank: a three-level annotation scenario. This volume.
Biron, P., Malhotra, A. (2000). XML Schema Part 2: Datatypes. W3C Candidate Recommendation. http://www.w3.org/TR/xmlschema-2/.
Brants, T., Skut, W., Uszkoreit, H. (2003). Syntactic Annotation of a German Newspaper Corpus. This volume.
Bray, T., Paoli, J., Sperberg-McQueen, C.M. (eds.) (1998). Extensible Markup Language (XML) Version 1.0. W3C Recommendation. http://www.w3. org:TR/1998/REC-xml-19980210.
Brickley, D. Guha, R.V. (2000). Resource Description Framework (RDF) Schema Specification 1.0. W3C Candidate Recommendation, 27 March 2000. http://www.w3.org/TR/rdf-schema/.
Carroll, J., Minnen, G., Briscoe, T. (2003). Parser Evaluation Using a Grammatical Relation Annotation Scheme. This volume.
Clark, J. (ed.) (1999). XSL Transformations (XSLT). Version 1.0. W3C Recommendation. http://www.w3.org/TR/xslt.
Daniel, R., DeRose, S., Maler, E. (2001). XML Pointer Language (XPointer) Version 1.0. W3C Recommendation. http://www.w3.org/TR/xptr.
Grefenstette, G. (1999). Shallow Parsing Techniques Applied to Medical Terminology Discovery and Normalization. Proceedings of IMIA WG6, Triennial Conference on Natural Language and Medical Concept Representation. Phoenix, AZ.
Harrison, P., Abney, S., Black, E., Flickinger, D., Gdaniec, C., Grishman, R., Hindle, D., Ingria, B., Marcus, M., Santorini, B., Strzalkowski, T. (1991). Evaluating syntax performance of parser/grammars of English. Proceedings of the Workshop on Evaluating Natural Language Processing Systems. 29th Meeting of the Association for Computational Linguistics, Berkeley, CA, p. 71–77.
Ide, N. (1998a). Encoding Linguistic Corpora. Proceedings of the Sixth Workshop on Very Large Corpora, 9–17.
Ide, N. (1998b). Corpus Encoding Standard: SGML Guidelines for Encoding Linguistic Corpora. Proceedings of the First International Language Resources and Evaluation Conference, p. 463–70.
Ide, N., Bonhomme, P., Romary, L. (2000). XCES: An XML-based Standard for Linguistic Corpora. Proceedings of the Second Language Resources and Evaluation Conference (LREC), Athens, Greece, p. 825–30.
Järvinen, T. (2003). Bank of English and Beyond: Hand-crafted Parsers for Functional Annotation. This volume.
Lassila, O. Swick, R. (1999) Resource Description framework (RDF) Model and Syntax Specification. W3C Recommendation. http://www.w3.org/ TR/REC-rdf-syntax.
Leech, G, Barnett, R., Kahrel, P. (1996). EAGLES Recommendations for the Syntactic Annotation of Corpora. EAG-TCWG-SASG/1.8. http://www. ilc.pi.cnr.it/EAGLES/segsasgl/segsasgl.html.
Marcus, M., Santorini, B., Marcinkiewicz, M.A. (1993). Building a Large Annotated Corpus of English: The Penn Treebank. Computational Linguistics, 19, 2, p. 313–30.
Sleator, D., Temperley, D., (1993). Parsing English with a Link Grammar. Proceedings of the 3rd International Confernece on Parsing Technologies, IWPT’93.
Tapanainen, P., Jarvinen, T. (1997). A Non-projective Dependency Parser. Proceedings of ANLP’97, Washington D.C., p. 64–71.
Taylor, A., Marcus, M., Santorini, B. (2003). The Penn Treebank: An Overview. This volume.
Thompson, H., Beech, D., Maloney, M. Mendelsohn, N. (eds.) (2000). XML Schema Part 1: Structures. W3C Candidate Recommendation, 24 October 2000. http://www.w3.org/TR/xmlschema-1/.
Wallis, S. (2003). Completing Parsed Corpora: From Correction to Evolution. This volume.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Ide, N., Romary, L. (2003). Encoding Syntactic Annotation. In: Abeillé, A. (eds) Treebanks. Text, Speech and Language Technology, vol 20. Springer, Dordrecht. https://doi.org/10.1007/978-94-010-0201-1_16
Download citation
DOI: https://doi.org/10.1007/978-94-010-0201-1_16
Publisher Name: Springer, Dordrecht
Print ISBN: 978-1-4020-1335-5
Online ISBN: 978-94-010-0201-1
eBook Packages: Springer Book Archive