Skip to main content

Modelling Linguistic Annotations

  • Chapter
  • First Online:

Abstract

This chapter describes how linguistic annotations can be represented in RDF. Web Annotation and NIF provide the means to reference text segments on the web. Yet, representing linguistic annotations requires appropriate vocabularies. We discuss relevant vocabularies and illustrate how they can be applied to support annotation at different levels.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. N. Ide, C. Chiarcos, M. Stede, S. Cassidy, Designing annotation schemes: from model to representation, in Handbook of Linguistic Annotation, ed. by N. Ide, J. Pustejovsky. Text, Speech, and Language Technology (Springer, Berlin, 2017)

    Chapter  Google Scholar 

  2. S. Bird, M. Liberman, A formal framework for linguistic annotation. Speech Commun. 33(1–2), 23 (2001)

    Article  Google Scholar 

  3. N. Ide, K. Suderman, The Linguistic Annotation Framework: a standard for annotation interchange and merging. Lang. Resour. Eval. 48(3), 395 (2014)

    Article  Google Scholar 

  4. ISO, ISO 24612:2012. Language resource management—Linguistic Annotation Framework. Technical Report, ISO/TC 37/SC 4, Language resource management (2012). https://www.iso.org/standard/37326.html

  5. N. Ide, K. Suderman, GrAF: a graph-based format for linguistic annotations, in Proceedings of the 1st Linguistic Annotation Workshop (LAW 2007), Prague, 2007, pp. 1–8

    Google Scholar 

  6. C. Chiarcos, S. Dipper, M. Götze, U. Leser, A. Lüdeling, J. Ritz, M. Stede, A flexible framework for integrating annotations from different tools and tag sets. TAL (Traitement Automatique des Langues) 49(2), 217 (2008)

    Google Scholar 

  7. W. Bosma, P. Vossen, A. Soroa, G. Rigau, M. Tesconi, A. Marchetti, M. Monachini, C. Aliprandi, KAF: a generic semantic annotation format, in Proceedings of the 5th International Conference on Generative Approaches to the Lexicon GL 2009, Pisa, 2009

    Google Scholar 

  8. R. Eckart, Choosing an XML database for linguistically annotated corpora, in Sprache und Datenverarbeitung. Proceedings of the KONVENS 2008 Workshop on Datenbanktechnologien für Hypermediale Linguistische Anwendungen, Berlin, 2008

    Google Scholar 

  9. A. Burchardt, S. Padó, D. Spohr, A. Frank, U. Heid, Formalising multi-layer corpora in OWL/DL—Lexicon modelling, querying and consistency control, in Proceedings of the 3rd International Joint Conference on NLP (IJCNLP), Hyderabad, 2008, pp. 389–396

    Google Scholar 

  10. S. Cassidy, An RDF realisation of LAF in the DaDa annotation server, in Proceedings of the 5th Joint ISO-ACL/SIGSEM Workshop on Interoperable Semantic Annotation (ISA-5), Hong Kong, 2010

    Google Scholar 

  11. A. Fokkens, A. Soroa, Z. Beloki, N. Ockeloen, G. Rigau, W.R. van Hage, P. Vossen, NAF and GAF: linking linguistic annotations, in Proceedings of the 10th Joint ISO-ACL SIGSEM Workshop on Interoperable Semantic Annotation (2014), pp. 9–16

    Google Scholar 

  12. E. Rubiera, L. Polo, D. Berrueta, A. El Ghali, TELIX: an RDF-based model for linguistic annotation, in Proceedings of the 9th Extended Semantic Web Conference (ESWC 2012), Heraklion, 2012

    Google Scholar 

  13. S. Hellmann, J. Lehmann, S. Auer, M. Brümmer, Integrating NLP using linked data, in Proceedings of the 12th International Semantic Web Conference (ISWC). Lecture Notes in Computer Science, vol. 8219 (Springer, Heidelberg, 2013), pp. 98–113

    Chapter  Google Scholar 

  14. N. Ide, K. Suderman, E. Nyberg, J. Pustejovsky, M. Verhagen, LAPPS/Galaxy: current state and next steps, in Proceedings of the 3rd International Workshop on Worldwide Language Service Infrastructure and 2nd Workshop on Open Infrastructures and Analysis Frameworks for Human Language Technologies (WLSI/OIAF4HLT2016) (2016), pp. 11–18

    Google Scholar 

  15. O. Christ, A modular and flexible architecture for an integrated corpus query system, in Proceedings of the 3rd Conference on Computational Lexicography and Text Research (COMPLEX’94), Budapest, 1994

    Google Scholar 

  16. A. Kilgarriff, V. Baisa, J. Bušta, M. Jakubíček, V. Kovář, J. Michelfeit, P. Rychlý, V. Suchomel, The Sketch Engine: ten years on. Lexicography 1(1), 7 (2014). https://doi.org/10.1007/s40607-014-0009-9

    Article  Google Scholar 

  17. C. Chiarcos, C. Fäth, CoNLL-RDF: Linked corpora done in an NLP-friendly way, in Proceedings of the 1st International Conference on Language, Data, and Knowledge, LDK 2017, ed. by J. Gracia, F. Bond, J.P. McCrae, P. Buitelaar, C. Chiarcos, S. Hellmann (Springer, Cham, 2017), pp. 74–88. https://doi.org/10.1007/978-3-319-59888-8_6

    Google Scholar 

  18. J. Nivre, Ž. Agić, L. Ahrenberg, et al., Universal dependencies 1.4 (2016). http://hdl.handle.net/11234/1-1827

  19. S. Brants, S. Hansen, Developments in the TIGER annotation scheme and their realization in the corpus, in Proceedings of the 3rd International Conference on Language Resources and Evaluation (LREC), Las Palmas, 2002, pp. 1643–1649

    Google Scholar 

  20. W. Lezius, H. Biesinger, C. Gerstenberger, TigerXML quick reference guide. Technical Report, IMS, University of Stuttgart (2002)

    Google Scholar 

  21. K.K. Schuler, VerbNet: a broad-coverage, comprehensive verb lexicon. Ph.D. thesis, University of Pennsylvania, Philadelphia, PA (2005). AAI3179808

    Google Scholar 

  22. J. Eckle-Kohler, J. McCrae, C. Chiarcos, lemonUby—a large, interlinked, syntactically-rich resource for ontologies. Semantic Web J. 6(4), 371 (2015)

    Article  Google Scholar 

  23. C. Chiarcos, Interoperability of corpora and annotations, in Linked Data in Linguistics, ed. by C. Chiarcos, S. Nordhoff, S. Hellmann (Springer, Heidelberg, 2012), pp. 161–179

    Chapter  Google Scholar 

  24. C. Chiarcos, POWLA: modeling linguistic corpora in OWL/DL, in Proceedings of the 9th Extended Semantic Web Conference (ESWC-2012), Heraklion, 2012, pp. 225–239

    Google Scholar 

  25. N. Mazziotta, Building the syntactic reference corpus of medieval French using NotaBene RDF annotation tool, in Proceedings of the 4th Linguistic Annotation Workshop (Association for Computational Linguistics, Stroudsburg, 2010), pp. 142–146

    Google Scholar 

  26. S. Hellmann, J. Lehmann, S. Auer, M. Brümmer, Integrating NLP using linked data, in Proceedings of the 12th International Semantic Web Conference, 21–25 October 2013, Sydney, 2013. Also see http://persistence.uni-leipzig.org/nlp2rdf/

  27. S. Dipper, M. Götze, Accessing heterogeneous linguistic data—generic XML-based representation and flexible visualization, in Proceedings of the 2nd Language & Technology Conference 2005, Poznan, 2005, pp. 23–30

    Google Scholar 

  28. M.G. Stefanie Dipper, ANNIS: complex multilevel annotations in a linguistic database, in Proceedings of the 5th Workshop on NLP and XML (NLPXML-2006): Multi-Dimensional Markup in Natural Language Processing, Trento, 2006

    Google Scholar 

  29. N. Ide, L. Romary, International standard for a Linguistic Annotation Framework. Nat. Lang. Eng. 10(3–4), 211 (2004)

    Article  Google Scholar 

  30. N. Ide, K. Suderman, GrAF: a graph-based format for linguistic annotations, in Proceedings of the Linguistic Annotation Workshop. Prague (Association for Computational Linguistics, Stroudsburg, 2007), pp. 1–8

    Google Scholar 

  31. M. Stede, H. Bieler, S. Dipper, A. Suriyawongk, Summar: combining linguistics and statistics for text summarization, in Proceedings of the 17th European Conference on Artificial Intelligence (ECAI), Riva del Garda, 2006, pp. 827–828

    Google Scholar 

  32. A. Zeldes, J. Ritz, A. Lüdeling, C. Chiarcos, ANNIS: a search tool for multi-layer annotated corpora, in Corpus Linguistics, Liverpool, 2009, pp. 20–23

    Google Scholar 

  33. F. Zipser, L. Romary, A model oriented approach to the mapping of annotation formats using standards, in Proceedings of the Workshop on Language Resources and Language Technology Standards, collocated with LREC (LR&LTS 2010), Valetta, 2010

    Google Scholar 

  34. N. Ide, C.F. Baker, C. Fellbaum, C.J. Fillmore, R. Passonneau, MASC: the manually annotated sub-corpus of American English, in Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC-2008), Marrakech, 2008, pp. 2455–2461

    Google Scholar 

  35. D.A. de Araujo, S.J. Rigo, J.L.V. Barbosa, Ontology-based information extraction for juridical events with case studies in Brazilian legal realm. Artif. Intell. Law 25(4), 379 (2017)

    Google Scholar 

  36. C. Chiarcos, C. Fäth, Graph-based annotation engineering: towards a gold corpus for Role and Reference Grammar, in Proceedings of the 2nd Conference on Language, Data and Knowledge (LDK). OpenAccess Series in Informatics (Schloss Dagstuhl, Leibniz-Zentrum fuer Informatik, 2019)

    Google Scholar 

  37. C. Chiarcos, B. Kosmehl, C. Fäth, M. Sukhareva, Analyzing Middle High German syntax with RDF and SPARQL, in Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC) (Miyazaki, Japan, 2018)

    Google Scholar 

  38. T. Krause, U. Leser, A. Lüdeling, graphANNIS: a fast query engine for deeply annotated linguistic corpora. J. Lang. Technol. Comput. Linguist. 31(1), 1 (2016)

    Google Scholar 

  39. M. Marcus, B. Santorini, M.A. Marcinkiewicz, Building a large annotated corpus of English: the Penn Treebank. Comput. Linguist. 19(2), 313 (1993)

    Google Scholar 

  40. P. Kingsbury, M. Palmer, From TreeBank to PropBank, in Proceedings of the 3rd International Conference on Language Resources and Evaluation (LREC), Las Palmas, 2002

    Google Scholar 

  41. E. Hovy, M. Marcus, M. Palmer, L. Ramshaw, R. Weischedel, OntoNotes: the 90% solution, in Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (HLT-NAACL) (Association for Computational Linguistics, New York, 2006), pp. 57–60

    Google Scholar 

  42. L. Carlson, D. Marcu, M.E. Okurowski, Building a discourse-tagged corpus in the framework of Rhetorical Structure Theory, in Current and New Directions in Discourse and Dialogue, ed. by J. van Kuppevelt, R. Smith. Text, Speech, and Language Technology, vol. 22, chap. 5 (Kluwer, Dordrecht, 2003)

    Google Scholar 

  43. P. Mendes, M. Jakob, A. García-Silva, C. Bizer, DBpedia SpotLight: shedding light on the web of documents, in Proceedings of the 7th International Conference on Semantic Systems (I-Semantics 2011), Graz, 2011

    Google Scholar 

  44. C. Lai, S. Bird, Querying and updating treebanks: a critical survey and requirements analysis, in Proceedings of the Australasian Language Technology Workshop (2004), pp. 139–146

    Google Scholar 

  45. M. Kouylekov, S. Oepen, Semantic technologies for querying linguistic annotations: an experiment focusing on graph-structured data, in Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC) (Reykjavik, Iceland, 2014)

    Google Scholar 

  46. A. Frank, C. Ivanovic, Building literary corpora for computational literary analysis—a prototype to bridge the gap between CL and DH, in Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC), Miyazaki, May 7–12, 2018

    Google Scholar 

  47. P. Banski, J. Bingel, N. Diewald, E. Frick, M. Hanl, M. Kupietz, P. Pezik, C. Schnober, A. Witt, KorAP: the new corpus analysis platform at IDS Mannheim, in Proceedings of the 6th Language & Technology Conference on Human Language Technology Challenges for Computer Science and Linguistics, December 7–9, 2013, Poznan, (2014), pp. 586–587

    Google Scholar 

  48. T. Krause, U. Leser, A. Lüdeling, graphANNIS: a fast query engine for deeply annotated linguistic corpora. JLCL 31(1), 1 (2016)

    Google Scholar 

  49. B. Bohnet, J. Kuhn, The best of both worlds: a graph-based completion model for transition-based parsers, in Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (Association for Computational Linguistics, Stroudsburg, 2012), pp. 77–87

    Google Scholar 

  50. F. Ferraro, M. Thomas, M.R. Gormley, T. Wolfe, C. Harman, B. Van Durme, Concretely annotated corpora, in Proceedings of the AKBC Workshop at NIPS (2014)

    Google Scholar 

  51. N. Ide, J. Pustejovsky (eds.), Designing Annotation Schemes: From Model to Representation. Text, Speech, and Language Technology (Springer, Berlin, 2017)

    Google Scholar 

  52. A. Pareja-Lora, M. Blume, B. Lust, C. Chiarcos (eds.), Development of Linguistic Linked Open Data Resources for Collaborative Data-Intensive Research in the Language Sciences (MIT Press, Cambridge, 2019)

    Google Scholar 

  53. D. Cavar, O. Baldinger, U.M. Joshua Herring, Y. Zhang, S. Bedekar, S. Panicker, An annotation encoding schema for natural language processing using JSON: NLP JSON schema version 0.1, November 2018. Technical Report, Indiana University (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Cimiano, P., Chiarcos, C., McCrae, J.P., Gracia, J. (2020). Modelling Linguistic Annotations. In: Linguistic Linked Data. Springer, Cham. https://doi.org/10.1007/978-3-030-30225-2_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-30225-2_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-30224-5

  • Online ISBN: 978-3-030-30225-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics