Language Resources and Evaluation

, Volume 50, Issue 2, pp 245–261

A comparative study of dictionaries and corpora as methods for language resource addition

Original Paper

DOI: 10.1007/s10579-016-9354-7

Cite this article as:
Mori, S. & Neubig, G. Lang Resources & Evaluation (2016) 50: 245. doi:10.1007/s10579-016-9354-7

Abstract

In this paper, we investigate the relative effect of two strategies for language resource addition for Japanese morphological analysis, a joint task of word segmentation and part-of-speech tagging. The first strategy is adding entries to the dictionary and the second is adding annotated sentences to the training corpus. The experimental results showed that addition of annotated sentences to the training corpus is better than the addition of entries to the dictionary. In particular, adding annotated sentences is especially efficient when we add new words with contexts of several real occurrences as partially annotated sentences, i.e. sentences in which only some words are annotated with word boundary information. According to this knowledge, we performed real annotation experiments on invention disclosure texts and observed word segmentation accuracy. Finally we investigated various language resource addition cases and introduced the notion of non-maleficence, asymmetricity, and additivity of language resources for a task. In the WS case, we found that language resource addition is non-maleficent (adding new resources causes no harm in other domains) and sometimes additive (adding new resources helps other domains). We conclude that it is reasonable for us, NLP tool providers, to distribute only one general-domain model trained from all the language resources we have.

Keywords

Partial annotationDomain adaptationDictionaryWord segmentationPOS taggingNon-maleficence of language resources

Copyright information

© Springer Science+Business Media Dordrecht 2016

Authors and Affiliations

  1. 1.Academic Center for Computing and Media StudiesKyoto UniversityKyotoJapan
  2. 2.Nara Institute of Science and TechnologyIkomaJapan