Skip to main content

A Statistical Approach to Analyzing Turkish Morphology

  • Conference paper
  • First Online:
Advances in Automation, Mechanical and Design Engineering (SAMDE 2021)

Part of the book series: Mechanisms and Machine Science ((Mechan. Machine Science,volume 121))

  • 183 Accesses

Abstract

Morphological analyzers are essential components of Turkish language processing pipelines due to complex and rich morphology of Turkish language. Previous work usually focuses on two-level description of Turkish morphology, thus leads to lexicon and two-level rules oriented FST implementations. However, two-level based implementations are not robust to spelling errors and new loan words entering the Turkish lexicon. In this paper, we introduce a statistical approach to analyzing Turkish word forms by training and comparing two seq2seq models on our annotated dataset. We approach analyzing Turkish word forms as a machine translation problem, where the source sequence is a Turkish word form and the target sequence is a sequence of morphemes. Evaluating on three testsets of informal written language word forms, we show that our approach offers a robust approach to analyzing the Turkish words and proposes a strong baseline for sequential modeling of Turkish morphology.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.dumps.wikimedia.org/trwiki/

  2. 2.

    https://www.github.com/turkish-nlp-suite/turkish-morph-lexicon.

  3. 3.

    Implementations can be found at https://www.github.com/turkish-nlp-suite/turkish-morph-lexicon/tree/main/keras_code

  4. 4.

    https://www.github.com/turkish-nlp-suite/turkish-morph-lexicon/blob/main/tagset.md.

References

  1. Oflazer, K., Gocmen, E., Bozsahin, C.: An Outline of Turkish Morphology (1994)

    Google Scholar 

  2. Oflazer, K., Saraçlar, M.: Turkish Natural Language Processing. Springer International Publishing (2018). https://www.books.google.de/books?id=D-5lDwAAQBAJ

  3. Chiu, J.P.C., Nichols, E.: Named entity recognition with bidirectional LSTM-CNNs. Trans. Assoc. Comput. Linguist. 4, 357–370 (2016). https://doi.org/10.1162/tacl_a_00104

    Article  Google Scholar 

  4. Koskenniemi, K.: Two-Level Model for Morphological Analysis. In: IJCAI (1983), pp. 683–685

    Google Scholar 

  5. Oflazer, .: Two-level Description of Turkish Morphology. Utrecht, The Netherlands, (1993). https://www.aclweb.org/anthology/E93-1066

  6. Sak, H., Güngör, T., Saraçlar, M.: A stochastic finite-state morphological parser for Turkish. In: Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, USA, 2009, pp. 273–276

    Google Scholar 

  7. Çöltekin, C.: A freely available morphological analyzer for Turkish. Valletta, Malta (2010). https://www.lrec-conf.org/proceedings/lrec2010/pdf/109_Paper.pdf

  8. Akın, A.A., Akın, M.D.: Zemberek, an open source NLP framework for Turkic Languages (2007)

    Google Scholar 

  9. Ozturel, A., Kayadelen, T., Demirsahin, I.: A syntactically expressive morphological analyzer for Turkish. In: Proceedings of the 14th International Conference on Finite-State Methods and Natural Language Processing, Dresden, Germany (2019), pp. 65–75. https://doi.org/10.18653/v1/W19-3110

  10. Stahlberg, F.: Neural machine translation: a review. CoRR (2019). http://www.arxiv.org/abs/1912.02047

  11. Tan, Z., et al.: Neural machine translation: a review of methods, resources, and tools. CoRR (2020). https://www.arxiv.org/abs/2012.15515

  12. Li, J., Sun, A., Han, J., Li, C.: A survey on deep learning for named entity recognition. CoRR (2018). http://www.arxiv.org/abs/1812.09449

  13. Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. CoRR (2014). http://www.arxiv.org/abs/1409.3215

  14. Bahdanau, D., Chorowski, J., Serdyuk, D., Brakel, P., Bengio, Y.: End-to-end attention-based large vocabulary speech recognition. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4945–4949 (2016). https://doi.org/10.1109/ICASSP.2016.7472618

  15. Chiu, C.-C., et al.: State-of-the-art speech recognition with sequence-to-sequence models. CoRR (2017). http://www.arxiv.org/abs/1712.01769

  16. Nassif, A.B., Shahin, I., Attili, I., Azzeh, M., Shaalan, K.: Speech recognition using deep neural networks: a systematic review. IEEE Access 7, 19143–19165 (2019). https://doi.org/10.1109/ACCESS.2019.2896880

    Article  Google Scholar 

  17. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. CoRR (2015)

    Google Scholar 

  18. Luong, M.-T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. CoRR (2015). http://www.arxiv.org/abs/1508.04025

  19. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735

    Article  Google Scholar 

  20. Göksel, A., Kerslake, C.: Turkish: A Comprehensive Grammar. Routledge (2005). https://www.books.google.de/books?id=7fXCKZmee8QC

  21. Petrov, S., Das, D., McDonald, R.: A universal part-of-speech tagset. In: Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC’12), Istanbul, Turkey, May 2012, pp. 2089–2096. https://www.lrec-conf.org/proceedings/lrec2012/pdf/274_Paper.pdf

  22. Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., Hovy, E.: Hierarchical attention networks for document classification. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, California, June 2016, pp. 1480–1489. https://doi.org/10.18653/v1/N16-1174

  23. Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization (2017)

    Google Scholar 

  24. Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, Chia Laguna Resort, Sardinia, Italy, May 2010, vol. 9, pp. 249–256. http://www.proceedings.mlr.press/v9/glorot10a.html

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Duygu Altinok .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Altinok, D. (2023). A Statistical Approach to Analyzing Turkish Morphology. In: Laribi, M.A., Carbone, G., Jiang, Z. (eds) Advances in Automation, Mechanical and Design Engineering. SAMDE 2021. Mechanisms and Machine Science, vol 121. Springer, Cham. https://doi.org/10.1007/978-3-031-09909-0_12

Download citation

Publish with us

Policies and ethics