Abstract
The Handbook of Linguistic Annotation provides a comprehensive survey of the development and state-of-the-art for linguistic annotation of language resources, including methods for annotation scheme design, annotation creation, physical format considerations, annotation tools, annotation use, evaluation, etc. The volume is divided into two parts: Part I includes survey chapters on the various phases and considerations for an annotation project, and Part II consists of thirty-nine case studies describing major annotation projects for a broad range of linguistic phenomena.
Keywords
- Linguistic annotation
- Language resources
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
- 2.
- 3.
- 4.
- 5.
See chapter “Community standards” in this volume for an overview.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
References
Artstein, R., Poesio, M.: Inter-coder agreement for computational linguistics. Comput. Linguist. 34(4), 555–596 (2008)
Bernsen, N.O., Dybkjær, L., Kolodnytsky, M.: The NITE workbench. A tool for annotation of natural interactivity and multimodal data. In: Proceedings of the Third International Conference on Language Resources and Evaluation (LREC-2002). European Language Resources Association (ELRA), Las Palmas, Canary Islands - Spain (2002). http://www.lrec-conf.org/proceedings/lrec2002/pdf/214.pdf. ACL Anthology Identifier: L02-1214
Bird, S., Day, D., Garofolo, J., Henderson, J., Laprun, C., Liberman, M.: ATLAS: a flexible and extensible architecture for linguistic annotation. In: Proceedings of the Second International Conference on Language Resources and Evaluation (LREC-2000). European Language Resources Association (ELRA), Athens, Greece (2000)
Bunt, H.: A methodology for designing semantic annotation languages exploiting semantic-syntactic isomorphisms. In: Proceedings of the Second International Conference on Global Interoperability for Language Resources (ICGL2010), pp. 29–46. City University of Hong Kong, Hong Kong SAR (2010)
Carletta, J.: Assessing agreement on classification tasks: the kappa statistic. Comput. Linguist. 22(2), 249–254 (1996)
Church, K.W.: A stochastic parts program and noun phrase parser for unrestricted text. In: Proceedings of the Second Conference on Applied Natural Language Processing, ANLC ’88, pp. 136–143. Association for Computational Linguistics, Stroudsburg, PA, USA (1988). doi:10.3115/974235.974260. http://dx.doi.org/10.3115/974235.974260
Clear, J.H.: The British National Corpus. In: Landow, G.P., Delany, P. (eds.) The Digital Word, pp. 163–187. MIT Press, Cambridge (1993)
Core, M., Ishizaki, M., Moore, J., Nakatani, C., Reithinger, N., Traum, D., Tutiya, S.: The report of the third workshop of the discourse resource initiative. Chiba University and Kazusa Academia Hall, Technical report (1998)
Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: GATE: a framework and graphical development environment for robust nlp tools and applications. In: Proceedings of ACL’02 (2002)
Cunningham, H., Wilks, Y., Gaizauskas, R.: Software infrastructure for language engineering. In: Proceedings of the AISB Workshop on Language Engineering for Document Analysis and Recognition. Brighton, U.K. (1996)
Day, D., Aberdeen, J., Hirschman, L., Kozierok, R., Robinson, P., Vilain, M.: Mixed-initiative development of language processing systems. In: Proceedings of the Fifth Conference on Applied Natural Language Processing, pp. 348–355. Association for Computational Linguistics, Washington, DC, USA (1997)
Day, D.S., McHenry, C., Kozierok, R., Riek, L.: Callisto: a configurable annotation workbench. In: Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC-2004). European Language Resources Association (2004)
DeRose, S.J.: Grammatical category disambiguation by statistical optimization. Comput. Linguist. 14(1), 31–39 (1988)
Doddington, G.R., Mitchell, A., Przybocki, M.A., Ramshaw, L.A., Strassel, S., Weischedel, R.M.: The automatic content extraction (ace) program - tasks, data, and evaluation. In: Proceedings of the Fourth Language Resources and Evaluation Conference (LREC 2004. European Language Resources Association (2004)
Erjaveç, T., Ide, N.: The MULTEXT-East corpus. In: Proceedings of First International Conference on Language Resources and Evaluation, pp. 971–974 (1998)
Ferrucci, D., Lally, A.: Uima: an architectural approach to unstructured information processing in the corporate research environment. Natural Lang. Eng. 10(3–4), 327–348 (2004)
Garside, R.: The CLAWS word-tagging system. In: R. Garside, G. Sampson, G. Leech (eds.) The Computational Analysis of English: A Corpus-Based Approach. Longman (1987). http://www.researchgate.net/publication/230876041_The_CLAWS_word-tagging_system
Garside, R., Leech, G., Sampson, G.: The computational analysis of English: a corpus-based approach. Longman (1987)
Greene, B.B., Rubin, G.M.: Automatic Grammatical Tagging of English. Brown University, Department of Linguistics (1971)
Grishman, R., Sundheim, B.: Message understanding conference-6: a brief history. In: Proceedings of the 16th Conference on Computational Linguistics - COLING ’96, vol. 1, pp. 466–471. Association for Computational Linguistics, Stroudsburg, PA, USA (1996)
Hellmann, S., Lehmann, J., Auer, S., Nitzschke, M.: Nif combinator: combining nlp tool output. In: 18th International Conference on Knowledge Engineering and Knowledge Management (EKAW2012) (2012)
Hovy, E., Lavid, J.: Towards a ‘science’ of corpus annotation: a new methodological challenge for corpus linguistics. Int. J. Transl. Stud. 22(2) (2010)
Ide, N.: Corpus encoding standard: SGML guidelines for encoding linguistic corpora. In: Proceedings of the First International Language Resources and Evaluation Conference (LREC 1998), pp. 463–470. European Language Resources Association (ELRA) (1998)
Ide, N.: Annotation science: from theory to practice and use. In: Rehm, G., Witt, A., Lemnitzer, L. (eds.) Data Structures for Linguistics Resources and Applications. Gunter Narr Verlag, Germany (2007)
Ide, N., Atwell, E. (eds.): Annotation science: state of the art in enhancing automatic linguistic annotation. In: Proceedings of the Workshop. European Language Resources Association (2006). http://www.lrec-conf.org/proceedings/lrec2006/
Ide, N., Bunt, H.: Anatomy of annotation schemes: mapping to GrAF. In: Proceedings of the Fourth Linguistic Annotation Workshop. LAW IV, pp. 247–255. Association for Computational Linguistics, Stroudsburg, PA, USA (2010)
Ide, N., Suderman, K.: The linguistic annotation framework: a standard for annotation interchange and merging. Lang. Resour. Eval. 48(3), 395–418 (2014)
Ide, N., Véronis, J.: MULTEXT: multilingual text tools and corpora. In: Proceedings of the 15th International Conference on Computational Linguistics (COLING 94), vol. I, pp. 588–592. Kyoto, Japan (1994)
Ide, N., Bonhomme, P., Romary, L.: XCES: an XML-based encoding standard for linguistic corpora. In: Proceedings of the Second Language Resources and Evaluation Conference (LREC 2000). European Language Resources Association (ELRA), Athens, Greece (2000)
Isard, A., Mller, M.B., McKelvie, D., Mengel, A.: The MATE workbench - a tool for annotating xml corpora. In: Proceedings of Recherche d’Informations Assiste par Ordinateur (RIAO’2000). Paris (2000)
Jäborg, J.: Introduction to “This is Watson". Göteborg University, Institute för spräkvetenskaplig databehandling (1986)
Kučera, H., Francis, W.N.: Computational Analysis of Present-Day American English. Brown University Press, Providence (1967)
Landes, S., Leacock, C., Tengi, R.I.: Building semantic concordances. In: Fellbaum, C. (ed.) WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)
Litman, D., Hirschberg, J.: Disambiguating cue phrases in text and speech. In: Proceedings of the 13th Conference on Computational Linguistics - COLING ’90, vol. 2, pp. 251–256. Association for Computational Linguistics, Stroudsburg, PA, USA (1990)
Marcu, D., Amorrortu, E., Romera, M.: Experiments in constructing a corpus of discourse trees. In: Proceedings Towards Standards and Tools for Discourse Tagging, pp. 48–57 (1999)
Marcus, M.P., Marcinkiewicz, M.A., Santorini, B.: Building a large annotated corpus of English: the Penn Treebank. Comput. Linguist. 19(2), 313–330 (1993)
Marcus, M., Kim, G., Marcinkiewicz, M.A., MacIntyre, R., Bies, A., Ferguson, M., Katz, K., Schasberger, B.: The penn treebank: annotating predicate argument structure. In: Proceedings of the Workshop on Human Language Technology, pp. 114–119. Association for Computational Linguistics, Stroudsburg, PA, USA (1994)
Melamed, I.D.: Manual annotation of translational equivalence: the Blinker project. CoRR cmp-lg/9805005 (1998)
Ng, H.T., Lim, C.Y., Foo, S.K.: A case study on inter-annotator agreement for word sense disambiguation. In: SIGLEX99: Standardizing Lexical Resources, pp. 351–14 (1999)
Ogren, P.V.: Knowtator: a Protégé plug-in for annotated corpus construction. In: Proceedings of the 2006 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: Companion Volume: Demonstrations, pp. 273–275. Association for Computational Linguistics, Stroudsburg, PA, USA (2006)
Paroubek, P.: Language resources as by-product of evaluation: the MultiTag example. In: Proceedings of the Second International Conference on Language Resources and Evaluation (LREC-2000). European Language Resources Association (ELRA), Athens, Greece (2000)
Passonneau, R.J., Litman, D.J.: Intention-based segmentation: human reliability and correlation with linguistic cues. Proceedings of the 31st Annual Meeting on Association for Computational Linguistics. ACL ’93, pp. 148–155. Association for Computational Linguistics, Stroudsburg, PA, USA (1993)
Pustejovsky, J., Stubbs, A.: Natural Language Annotation for Machine Learning. O’Reilly Media, California (2013)
Resnik, P.: Disambiguating noun groupings with respect to WordNet senses. In: Proceedings of the 3rd Workshop on Very Large Corpora (1995)
Sampson, G.: English for the Computer: the SUSANNE corpus and analytic scheme. Clarendon Press, Oxford (1995)
Siegel, S., Castellan, N.: Nonparametric statistics for the behavioral sciences, second edn. McGraw–Hill, New York (1988)
Silverman, K.E.A., Beckman, M.E., Pitrelli, J.F., Ostendorf, M., Wightman, C.W., Price, P., Pierrehumbert, J.B., Hirschberg, J.: ToBI: a standard for labeling English prosody. In: International Conference on Spoken Language Processing. ISCA (1992)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Ide, N. (2017). Introduction: The Handbook of Linguistic Annotation. In: Ide, N., Pustejovsky, J. (eds) Handbook of Linguistic Annotation. Springer, Dordrecht. https://doi.org/10.1007/978-94-024-0881-2_1
Download citation
DOI: https://doi.org/10.1007/978-94-024-0881-2_1
Published:
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-024-0879-9
Online ISBN: 978-94-024-0881-2
eBook Packages: Social SciencesSocial Sciences (R0)