Reusing Linguistic Resources: Tasks and Goals for a Linked Data Approach
There is a need to share linguistic resources, but reuse is impaired by a number of constraints including lack of common formats, differences in conceptual notions, and unsystematic metadata. In this contribution, the five most important constraints and the tasks necessary to overcome these issues are detailed. These constraints lie in the design of linguistic resources, the way they are marked up and their metadata. These issues have also come up in a domain other than linguistics, namely in the semantic web, where the Linked Data approach proved useful. Experiences and lessons learnt from that domain are discussed in the light of standardisation and reconciliation of concepts and representations of linguistic annotations.
KeywordsNatural Language Processing Link Data Entity Recognition Link Open Data Linguistic Resource
Unable to display preview. Download preview PDF.
- Chiarcos C (this vol.) Interoperability of corpora and annotations. pp 161–179 Google Scholar
- Chiarcos C, Hellmann S, Nordhoff S (this vol.) The Open Linguistics Working Group of the Open Knowledge Foundation. pp 153–160 Google Scholar
- Consortium LD (2005) ACE (Automatic Content Extraction) English Annotation Guidelines for Entities version 5.6.1 Google Scholar
- Eckart K, Riester A, Schweitzer K (this vol.) A discourse information radio news database for linguistic analysis. pp 65–75 Google Scholar
- Halpin H, Hayes PJ, McCusker JP, McGuinness DL, Thompson HS (2010) When owl:sameAs isn’t the same: An analysis of identity in linked data. In: The 9th International Semantic Web Conference (ISWC 2010), Shanghai, China, pp 305–320 Google Scholar
- Hellmann S, Stadler C, Lehmann J (this vol.) The German DBpedia: A sense repository for linking entities. pp 181–189 Google Scholar
- Petrov S, Das D, McDonald R (2011) A universal part-of-speech tagset. arXiv:1104.2086v1
- Pustejovsky J, Lee K, Bunt H, Romary L (2010) ISO-TimeML: An international standard for semantic annotation. In: Proceedings of LREC 2010, pp 394–397 Google Scholar
- Recasens M, Hovy E, Martí MA (2011) Identify, non-identity, and near-identity: Addressing the complexity of coreference. Lingua pp 1138–1152 Google Scholar
- Teufel S (1997) A support tool for tagset mapping. arXiv:cmp-lg/9506005v2
- Tjong Kim Sang EF, Meulder FD (2003) Introduction to the conll-2003 shared task: Language-independent named entity recognition. In: Proceedings of CoNLL-2003, Edmonton, Canada, pp 142–147 Google Scholar
- Windhouwer M, Wright SE (this vol.) Linking to linguistic data categories in ISOcat. pp 99–107 Google Scholar