Advertisement

A Support Vector Machine Approach to Dutch Part-of-Speech Tagging

  • Mannes Poel
  • Luite Stegeman
  • Rieks op den Akker
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4723)

Abstract

Part-of-Speech tagging, the assignment of Parts-of-Speech to the words in a given context of use, is a basic technique in many systems that handle natural languages. This paper describes a method for supervised training of a Part-of-Speech tagger using a committee of Support Vector Machines on a large corpus of annotated transcriptions of spoken Dutch. Special attention is paid to the decomposition of the large data set into parts for common, uncommon and unknown words. This does not only solve the space problems caused by the amount of data, it also improves the tagging time. The performance of the resulting tagger in terms of accuracy is 97.54 %, which is quite good, where the speed of the tagger is reasonably good.

Keywords

Part-of-Speech tagging Support Vector Machines 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Canisius, S., van den Bosch, A.: A memory-based shallow parser for spoken dutch. In: ILK/Computational Linguistics and AI, Tilburg University (2004)Google Scholar
  2. 2.
    Daelemans, W., Zavrel, J., Berck, P., Gillis, S.: Mbt: A memory-based part of speech tagger-generator. In: Proceedings of the 4th Workshop on Very Large Corpora, ACL SIGDAT (2000)Google Scholar
  3. 3.
    Brill, E.: Transformation-based error-driven learning and natural language processing: A case study in part-of-speech tagging. Computational Linguistics 21(4), 543–565 (1995)Google Scholar
  4. 4.
    Brants, T.: TnT – A Statistical Part-of-Speech Tagger. In: Proceedings of the 6th Applied NLP Conference (ANLP-2000), pp. 224–331 (2000)Google Scholar
  5. 5.
    Zavrel, J., Daelemans, W.: Bootstrapping a tagged corpus through combination of existing heterogeneous taggers. In: Proceedings of the 2nd International Conference on Language Resources and Evaluation (LREC) (2002)Google Scholar
  6. 6.
    Cortes, C., Vapnik, V.: Support-vector networks. Machine Learning 20(3), 273–297 (1995)MATHGoogle Scholar
  7. 7.
    Boser, B., Guyon, I., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory, pp. 144–152. ACM Press, New York (1992)CrossRefGoogle Scholar
  8. 8.
    Oostdijk, N., Goedertier, W., van Eynde, F., Boves, L., Martens, J.P., Moortgat, M., Baayen, H.: Experiences from the spoken dutch corpus project. In: Proceedings of the 2nd International Conference on Language Resources and Evaluation (LREC), pp. 340–347 (2002)Google Scholar
  9. 9.
    van Eynde, F.: Part of speech tagging en lemmatisering. Technical report, Centrum voor Computerlinguïstiek, K.U. Leuven (2000)Google Scholar
  10. 10.
    Giménez, J., Márquez, L.: SVMTool: A general POS tagger based on support vector machines. In: Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC) (2004)Google Scholar
  11. 11.
    Nakagawa, T., Kudo, T., Matsumoto, Y.: Unknown word guessing and part-of-speech tagging using support vector machines. In: Proceedings of the Sixth Natural Language Processing Pacific Rim Symposium, pp. 325–331 (2001)Google Scholar
  12. 12.
    Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines (2001), Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
  13. 13.
    Alpaydin, E.: Introduction to Machine Learning. MIT Press, Cambridge (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Mannes Poel
    • 1
  • Luite Stegeman
    • 1
  • Rieks op den Akker
    • 1
  1. 1.Human Media Interaction, Dept. Computer Science, University of Twente, P.O. Box 217, 7500 AE EnschedeThe Netherlands

Personalised recommendations