Unsupervised Training of a Finite-State Sliding-Window Part-of-Speech Tagger

  • Enrique Sánchez-Villamil
  • Mikel L. Forcada
  • Rafael C. Carrasco
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3230)


A simple, robust sliding-window part-of-speech tagger ispresented and a method is given to estimate its parameters from an untagged corpus. Its performance is compared to a standard Baum-Welch-trained hidden-Markov-model part-of-speech tagger. Transformation into a finite-state machine — behaving exactly as the tagger itself— is demonstrated.


Hide Markov Model Ambiguous Word Computational Linguistics Word Class Word Error Rate 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Cutting, D., Kupiec, J., Pedersen, J., Sibun, P.: A practical part-of-speech tagger. In: Third Conference on Applied Natural Language Processing. Association for Computational Linguistics. Proceedings of the Conference, Trento, Italia, Marzo 31–abril 3, pp. 133–140 (1992)Google Scholar
  2. 2.
    Kempe, A.: Finite state transducers approximating hidden Markov models. In: Cohen, P.R., Wahlster, W. (eds.) Proceedings of the Thirty-Fifth Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics, Somerset, New Jersey, pp. 460–467 (1997)Google Scholar
  3. 3.
    Marcus, M., Kim, G., Marcinkiewicz, M.A., MacIntyre, R., Bies, A., Ferguson, M., Katz, K., Schasberger, B.: The Penn Treebank: Annotating predicate argument structure. In: Proc. ARPA Human Language Technology Workshop, pp. 110–115 (1994)Google Scholar
  4. 4.
    Marcus, M.P., Santorini, B., Marcinkiewicz, M.A.: Building a large annotated corpus of english: the Penn Treebank. Computational linguistics 19, 313–330 (1993); Reprinted In: Armstrong, S. (ed.) Using large corpora, pp. 273–290. MIT Press, Cambridge (1994)Google Scholar
  5. 5.
    Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 77(2), 257–286 (1989)CrossRefGoogle Scholar
  6. 6.
    Roche, E., Schabes, Y.: Introduction. In: Roche, E., Schabes, Y. (eds.) Finite-State Language Processing, pp. 1–65. MIT Press, Cambridge (1997)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Enrique Sánchez-Villamil
    • 1
  • Mikel L. Forcada
    • 1
  • Rafael C. Carrasco
    • 1
  1. 1.Transducens, Departament de Llenguatges i Sistemes InformàticsUniversitat d’AlacantAlacant

Personalised recommendations