Abstract
Morphological analyzers are essential components of Turkish language processing pipelines due to complex and rich morphology of Turkish language. Previous work usually focuses on two-level description of Turkish morphology, thus leads to lexicon and two-level rules oriented FST implementations. However, two-level based implementations are not robust to spelling errors and new loan words entering the Turkish lexicon. In this paper, we introduce a statistical approach to analyzing Turkish word forms by training and comparing two seq2seq models on our annotated dataset. We approach analyzing Turkish word forms as a machine translation problem, where the source sequence is a Turkish word form and the target sequence is a sequence of morphemes. Evaluating on three testsets of informal written language word forms, we show that our approach offers a robust approach to analyzing the Turkish words and proposes a strong baseline for sequential modeling of Turkish morphology.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
Implementations can be found at https://www.github.com/turkish-nlp-suite/turkish-morph-lexicon/tree/main/keras_code
- 4.
References
Oflazer, K., Gocmen, E., Bozsahin, C.: An Outline of Turkish Morphology (1994)
Oflazer, K., Saraçlar, M.: Turkish Natural Language Processing. Springer International Publishing (2018). https://www.books.google.de/books?id=D-5lDwAAQBAJ
Chiu, J.P.C., Nichols, E.: Named entity recognition with bidirectional LSTM-CNNs. Trans. Assoc. Comput. Linguist. 4, 357–370 (2016). https://doi.org/10.1162/tacl_a_00104
Koskenniemi, K.: Two-Level Model for Morphological Analysis. In: IJCAI (1983), pp. 683–685
Oflazer, .: Two-level Description of Turkish Morphology. Utrecht, The Netherlands, (1993). https://www.aclweb.org/anthology/E93-1066
Sak, H., Güngör, T., Saraçlar, M.: A stochastic finite-state morphological parser for Turkish. In: Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, USA, 2009, pp. 273–276
Çöltekin, C.: A freely available morphological analyzer for Turkish. Valletta, Malta (2010). https://www.lrec-conf.org/proceedings/lrec2010/pdf/109_Paper.pdf
Akın, A.A., Akın, M.D.: Zemberek, an open source NLP framework for Turkic Languages (2007)
Ozturel, A., Kayadelen, T., Demirsahin, I.: A syntactically expressive morphological analyzer for Turkish. In: Proceedings of the 14th International Conference on Finite-State Methods and Natural Language Processing, Dresden, Germany (2019), pp. 65–75. https://doi.org/10.18653/v1/W19-3110
Stahlberg, F.: Neural machine translation: a review. CoRR (2019). http://www.arxiv.org/abs/1912.02047
Tan, Z., et al.: Neural machine translation: a review of methods, resources, and tools. CoRR (2020). https://www.arxiv.org/abs/2012.15515
Li, J., Sun, A., Han, J., Li, C.: A survey on deep learning for named entity recognition. CoRR (2018). http://www.arxiv.org/abs/1812.09449
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. CoRR (2014). http://www.arxiv.org/abs/1409.3215
Bahdanau, D., Chorowski, J., Serdyuk, D., Brakel, P., Bengio, Y.: End-to-end attention-based large vocabulary speech recognition. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4945–4949 (2016). https://doi.org/10.1109/ICASSP.2016.7472618
Chiu, C.-C., et al.: State-of-the-art speech recognition with sequence-to-sequence models. CoRR (2017). http://www.arxiv.org/abs/1712.01769
Nassif, A.B., Shahin, I., Attili, I., Azzeh, M., Shaalan, K.: Speech recognition using deep neural networks: a systematic review. IEEE Access 7, 19143–19165 (2019). https://doi.org/10.1109/ACCESS.2019.2896880
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. CoRR (2015)
Luong, M.-T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. CoRR (2015). http://www.arxiv.org/abs/1508.04025
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735
Göksel, A., Kerslake, C.: Turkish: A Comprehensive Grammar. Routledge (2005). https://www.books.google.de/books?id=7fXCKZmee8QC
Petrov, S., Das, D., McDonald, R.: A universal part-of-speech tagset. In: Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC’12), Istanbul, Turkey, May 2012, pp. 2089–2096. https://www.lrec-conf.org/proceedings/lrec2012/pdf/274_Paper.pdf
Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., Hovy, E.: Hierarchical attention networks for document classification. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, California, June 2016, pp. 1480–1489. https://doi.org/10.18653/v1/N16-1174
Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization (2017)
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, Chia Laguna Resort, Sardinia, Italy, May 2010, vol. 9, pp. 249–256. http://www.proceedings.mlr.press/v9/glorot10a.html
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Altinok, D. (2023). A Statistical Approach to Analyzing Turkish Morphology. In: Laribi, M.A., Carbone, G., Jiang, Z. (eds) Advances in Automation, Mechanical and Design Engineering. SAMDE 2021. Mechanisms and Machine Science, vol 121. Springer, Cham. https://doi.org/10.1007/978-3-031-09909-0_12
Download citation
DOI: https://doi.org/10.1007/978-3-031-09909-0_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-09908-3
Online ISBN: 978-3-031-09909-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)