he Possibilities of Automatic Detection/Correction of Errors in Tagged Corpora: A Pilot Study on a German Corpus

  • Karel Oliva
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2166)


The performance of taggers is usuallye valuated bytheir percentual success rate. Because of the pure quantitativity of such an approach, all errors committed bythe tagger are treated on a par for the purpose of the evaluation. This paper takes a different, qualitative stand on the topic, arguing that the previous viewpoint is not linguisticallyadequate: the errors (might) differ in severity. General implications for tagging are discussed, and a simple method is proposed and exemplified, able to
  1. 1.

    detect and in some cases even rectifythe most severe errors and thus

  2. 2.

    contribute to arriving finally at a better tagged corpus.

Some encouraging results achieved bya verysimple, manuallyperformed test and evaluation on a small sample of a corpus are given.


Ambiguity Resolution Subordinate Clause Relative Pronoun Test Corpus German Corpus 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Brants T. (2000) TnT — A Statistical Part-of-Speech Tagger, Proceedings of the Sixth Applied Natural Language Processing Conference ANLP-2000, SeattleGoogle Scholar
  2. 2.

Copyright information

© Springer-Verlag Berlin Heidelberg 2001

Authors and Affiliations

  • Karel Oliva
    • 1
  1. 1.Austrian Research Institute for Artificial IntelligenceWienAustria

Personalised recommendations