Reusing Linguistic Resources: Tasks and Goals for a Linked Data Approach

Abstract

There is a need to share linguistic resources, but reuse is impaired by a number of constraints including lack of common formats, differences in conceptual notions, and unsystematic metadata. In this contribution, the five most important constraints and the tasks necessary to overcome these issues are detailed. These constraints lie in the design of linguistic resources, the way they are marked up and their metadata. These issues have also come up in a domain other than linguistics, namely in the semantic web, where the Linked Data approach proved useful. Experiences and lessons learnt from that domain are discussed in the light of standardisation and reconciliation of concepts and representations of linguistic annotations.

Keywords

Natural Language Processing Link Data Entity Recognition Link Open Data Linguistic Resource 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bizer C, Heath T, Berners-Lee T (2009a) Linked data - the story so far. International Journal on Semantic Web and Information Systems (IJSWIS) 5(3):1–22 CrossRefGoogle Scholar
  2. Bizer C, Lehman J, Kobilarov G, Auer S, Becker C, Cyganiak R, Hellmann S (2009b) DBpedia - A crystallization point for the web of data. Web Semantics: Science, Services and Agents on the World Wide Web 7(3):154–165 CrossRefGoogle Scholar
  3. Chiarcos C (this vol.) Interoperability of corpora and annotations. pp 161–179 Google Scholar
  4. Chiarcos C, Hellmann S, Nordhoff S (this vol.) The Open Linguistics Working Group of the Open Knowledge Foundation. pp 153–160 Google Scholar
  5. Consortium LD (2005) ACE (Automatic Content Extraction) English Annotation Guidelines for Entities version 5.6.1 Google Scholar
  6. Eckart K, Riester A, Schweitzer K (this vol.) A discourse information radio news database for linguistic analysis. pp 65–75 Google Scholar
  7. Euzenat J, Meilicke C, Stuckenschmidt H, Shvaiko P, Trojahn C (2011) Ontology alignment evaluation initiative: Six years of experience. Journal on Data Semantics 15:158–192 CrossRefGoogle Scholar
  8. Fellbaum C (ed) (1998) WordNet: An Electronic Lexical Database. The MIT Press MATHGoogle Scholar
  9. Halpin H, Hayes PJ, McCusker JP, McGuinness DL, Thompson HS (2010) When owl:sameAs isn’t the same: An analysis of identity in linked data. In: The 9th International Semantic Web Conference (ISWC 2010), Shanghai, China, pp 305–320 Google Scholar
  10. Hellmann S, Stadler C, Lehmann J (this vol.) The German DBpedia: A sense repository for linking entities. pp 181–189 Google Scholar
  11. Mika P, Ciaramita M, Zaragoza H, Atserias J (2008) Learning to tag and tagging to learn: A case study on Wikipedia. IEEE Intelligent Systems 23(5):26–33 CrossRefGoogle Scholar
  12. Petrov S, Das D, McDonald R (2011) A universal part-of-speech tagset. arXiv:1104.2086v1
  13. Pustejovsky J, Lee K, Bunt H, Romary L (2010) ISO-TimeML: An international standard for semantic annotation. In: Proceedings of LREC 2010, pp 394–397 Google Scholar
  14. Recasens M, Hovy E, Martí MA (2011) Identify, non-identity, and near-identity: Addressing the complexity of coreference. Lingua pp 1138–1152 Google Scholar
  15. Teufel S (1997) A support tool for tagset mapping. arXiv:cmp-lg/9506005v2
  16. Tjong Kim Sang EF, Meulder FD (2003) Introduction to the conll-2003 shared task: Language-independent named entity recognition. In: Proceedings of CoNLL-2003, Edmonton, Canada, pp 142–147 Google Scholar
  17. Windhouwer M, Wright SE (this vol.) Linking to linguistic data categories in ISOcat. pp 99–107 Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  1. 1.Web and Media Group, Computer Sciences DepartmentVU UniversityAmsterdamThe Netherlands

Personalised recommendations