Spanish Diacritic Error Detection and Restoration—A Survey

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9561)

Abstract

In this paper we address the problem of diacritic error detection and restoration—the task of identifying and correcting missing accents in text. In particular, we evaluate the performance of a simple part-of-speech tagger-based technique comparing it to other established methods for error detection/restoration: unigram frequency, decision lists, discriminative classifiers, a machine-translation based method, and grapheme-based approaches. In languages such as Spanish (the focus here), diacritics play a key role in disambiguation and results show that a straightforward modification to an n-gram tagger can be used to achieve good performance in diacritic error identification without resorting to any specialized machinery. Our method should be applicable to any language where diacritics distribute comparably and perform similar roles of disambiguation.

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.University of ColoradoBoulderUSA
  2. 2.Wake Forest UniversityWinston-SalemUSA

Personalised recommendations