Morphological Disambiguation of Turkish Text with Perceptron Algorithm

  • Haşim Sak
  • Tunga Güngör
  • Murat Saraçlar
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4394)

Abstract

This paper describes the application of the perceptron algorithm to the morphological disambiguation of Turkish text. Turkish has a productive derivational morphology. Due to the ambiguity caused by complex morphology, a word may have multiple morphological parses, each with a different stem or sequence of morphemes. The methodology employed is based on ranking with perceptron algorithm which has been successful in some NLP tasks in English. We use a baseline statistical trigram-based model of a previous work to enumerate an n-best list of candidate morphological parse sequences for each sentence. We then apply the perceptron algorithm to rerank the n-best list using a set of 23 features. The perceptron trained to do morphological disambiguation improves the accuracy of the baseline model from 93.61% to 96.80%. When we train the perceptron as a POS tagger, the accuracy is 98.27%. Turkish morphological disambiguation and POS tagging results that we obtained is the best reported so far.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Oflazer, K.: Two-level Description of Turkish Morphology. Literary and Linguistic Computing 9(2), 137–148 (1994)CrossRefGoogle Scholar
  2. 2.
    Karlsson, F., Voutilainen, A., Heikkila, J., Anttila, A.: Constraint Grammar-A Language-Independent System for Parsing Unrestricted Text (1995)Google Scholar
  3. 3.
    Brill, E.: A Simple Rule-Based Part-of-Speech Tagger. In: Proceedings of Third Conference on Applied Natural Language Processing, Trento, Italy (1992)Google Scholar
  4. 4.
    Brill, E.: Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging. Computational Linguistics (1995)Google Scholar
  5. 5.
    Church, K.W.: A stochastic parts program and noun phrase parser for unrestricted text. In: Proceedings of Second Conference on Applied Natural Language Processing, Austin, Texas (1988)Google Scholar
  6. 6.
    Ratnaparkhi, A.: A Maximum-Entropy Model for Part-of-Speech Tagging. In: Proceedings of the emprical methods in natural language processing conference (1996)Google Scholar
  7. 7.
    Cutting, D., Kupiec, J., Pealersen, J., Sibun, P.: A practical part-of-speech tagger. In: Proceedings of Third Conference on Applied Natural Language Processing, Trento, Italy (1992)Google Scholar
  8. 8.
    Hajič, J., Hladká, B.: Tagging inflective languages: prediction of morphological categories for a rich, structured tagset. In: Proceedings of COLING-ACL Conference (1998)Google Scholar
  9. 9.
    Oflazer, K., Tür, G.: Combining Hand-crafted Rules and Unsupervised Learning in Constraint-based Morphological Disambiguation. In: Proceedings of the ACL-SIGDAT Conference on Empirical Methods in Natural Language Processing, Philadelphia, PA, USA (1996)Google Scholar
  10. 10.
    Oflazer, K., Tür, G.: Morphological Disambiguation by Voting Constraints. In: Proceedings of ACL/EACL, The 35th Annual Meeting of the Association for Computational Linguistics, Madrid, Spain (1997)Google Scholar
  11. 11.
    Hakkani-Tür, D.Z., Oflazer, K., Tür, G.: Statistical Morphological Disambiguation for Agglutinative Languages. Computers and the Humanities 36(4) (2002)Google Scholar
  12. 12.
    Yüret, D., Türe, F.: Learning Morphological Disambiguation Rules for Turkish. In: Proceedings of HLT-NAACL (2006)Google Scholar
  13. 13.
    Freund, Y., Schapire, R.E.: Large Margin Classification using the Perceptron Algorithm. Machine Learning 37(3), 277–296 (1999)CrossRefMATHGoogle Scholar
  14. 14.
    Collins, M., Duffy, N.: New Ranking Algorithms for Parsing and Tagging: Kernels over Discrete Structures, and the Voted Perceptron. In: Proceedings of ACL (2002)Google Scholar
  15. 15.
    Collins, M.: Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms. In: Proceedings of EMNLP (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Haşim Sak
    • 1
  • Tunga Güngör
    • 1
  • Murat Saraçlar
    • 2
  1. 1.Dept. of Computer Engineering, Boğaziçi University, Bebek, 34342, IstanbulTurkey
  2. 2.Dept. of Electrical and Electronic Engineering, Boğaziçi University, Bebek, 34342, IstanbulTurkey

Personalised recommendations