Language Resources and Evaluation

, Volume 40, Issue 2, pp 175–181 | Cite as

Tagging Icelandic text: an experiment with integrations and combinations of taggers

  • Hrafn Loftsson
Original Paper


We use integrations and combinations of taggers to improve the tagging accuracy of Icelandic text. The accuracy of the best performing integrated tagger, which consists of our linguistic rule-based tagger for initial disambiguation and a trigram tagger for full disambiguation, is 91.80%. Combining five different taggers, using simple voting, results in 93.34% accuracy. By adding two linguistically motivated rules to the combined tagger, we obtain an accuracy of 93.48%. This method reduces the error rate by 20.5%, with respect to the best performing tagger in the combination pool.


Combination of taggers Integration of taggers Linguistically motivated rules Simple voting Tagging accuracy 



data-driven taggers


Hidden Markov model


Icelandic frequency dictionary


linguistically motivated rules



Thanks to the Institute of Lexicography at the University of Iceland, for providing access to the IFD corpus, and Professor Y. Wilks for valuable comments and suggestions in the preparation of this paper.


  1. Borin, L. (2000). Something borrowed, something blue: Rule-based combination of POS taggers. In Proceedings of the 2nd International Conference on Language Resources and Evaluation. Greece: Athens.Google Scholar
  2. Brants, T. (2000). TnT: A statistical part-of-speech tagger. In Proceedings of the 6th Conference on Applied natural language processing. Seattle, WA, USA.Google Scholar
  3. Daelemans, W., Zavrel, J., Berck, P., & Gillis, S. (1996). MBT: a Memory-Based Part of Speech Tagger-Generator. In Proceedings of the 4th Workshop on Very Large Corpora. Copenhagen, Denmark.Google Scholar
  4. Daelemans, W., Zavrel, J., & van den Bosch, A. (2003). MBT: Memory-Based Tagger. Reference Guide: ILK Technical Report-ILK 03–13.Google Scholar
  5. Dietterich, T. G. (1998) Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation, 10(7), 1895–1924.CrossRefGoogle Scholar
  6. Hajič, J., Krbec, P., Oliva, K., Květoň, P., & Petkevič, V. (2001). Serial combination of rules and statistics: a case study in Czech tagging. In Proceedings of the 39th Association of Computational Linguistics Conference. Toulouse, France.Google Scholar
  7. Helgadóttir, S. (2004). Testing Data-Driven Learning algorithms for PoS tagging of Icelandic. In H. Holmboe (Ed.), Nordisk Sprogteknologi 2004. Museum Tusculanums Forlag.Google Scholar
  8. Karlsson, F., Voutilainen, A., Heikkilä, J., & Anttila, A. (1995). Constraint grammar: a language-independent system for parsing unrestricted text. Mouton de Gruyter, Berlin, Germany.Google Scholar
  9. Loftsson, H. (2006a). Tagging Icelandic text: A linguistic rule-based approach. Technical Report CS-06-04, Department of Computer Science, University of Sheffield.Google Scholar
  10. Loftsson, H. (2006b). Tagging a morphologically complex language using heuristics. In T. Salakoski, F. Ginter, S. Pyysalo, & T. Pahikkala (Eds.), Advances in Natural Language Processing, 5th International Conference on NLP, FinTAL 2006, Proceedings. Turku, Finland.Google Scholar
  11. Ngai, G., & Florian, R. (2001), Transformation-based learning in the fast lane. In Proceedings of the 2nd Conference of the North American Chapter of the ACL. Pittsburgh, PA, USA.Google Scholar
  12. Pind, J., Magnússon, F., & Briem, S. (1991). The Icelandic frequency dictionary. The Institute of Lexicography at the University of Iceland, Reykjavik, Iceland.Google Scholar
  13. Ratnaparkhi A. (1996) A Maximum Entropy Part-of-Speech Tagger. In Proceedings of the Empirical Methods in Natural Language Processing Conference. Philadelphia, PA, USA.Google Scholar
  14. Sjöbergh, J. (2003). Combining POS-taggers for improved accuracy on Swedish text. In Proceedings of NoDaLiDa 2003. Reykjavik, Iceland.Google Scholar
  15. van Halteren, H., Zavrel, J., & Daelemans, W. (2001) Improving accuracy in wordclass tagging through combination of machine learning systems. Computational Linguistics, 27(2), 199–230.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media 2006

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of SheffieldSheffieldUK
  2. 2.Department of Computer ScienceReykjavik UniversityReykjavikIceland

Personalised recommendations