Advertisement

Turkish and Its Challenges for Language and Speech Processing

  • Kemal OflazerEmail author
  • Murat Saraçlar
Chapter
Part of the Theory and Applications of Natural Language Processing book series (NLP)

Abstract

We present a short survey and exposition of some of the important aspects of Turkish that have proved to be interesting and challenging for natural language and speech processing. Most of the challenges stem from the complex morphology of Turkish and how morphology interacts with syntax. Finally we provide a short overview of the major tools and resources developed for Turkish over the last two decades. (Parts of this chapter were previously published as Oflazer (Lang Resour Eval 48(4):639–653, 2014).)

References

  1. Aksan Y, Aksan M, Koltuksuz A, Sezer T, Mersinli Ü, Demirhan UU, Yılmazer H, Kurtoğlu Ö, Öz S, Yıldız İ (2012) Construction of the Turkish National Corpus (TNC). In: Proceedings of LREC, Istanbul, pp 3223–3227Google Scholar
  2. Arısoy E (2009) Statistical and discriminative language modeling for Turkish large vocabulary continuous speech recognition. PhD thesis, Boğaziçi University, IstanbulGoogle Scholar
  3. Arısoy E, Can D, Parlak S, Sak H, Saraçlar M (2009) Turkish broadcast news transcription and retrieval. IEEE Trans Audio Speech Lang Process 17(5):874–883Google Scholar
  4. Beesley KR, Karttunen L (2003) Finite state morphology. CSLI Publications, Stanford University, StanfordGoogle Scholar
  5. Bilgin O, Çetinoğlu Ö, Oflazer K (2004) Building a Wordnet for Turkish. Rom J Inf Sci Technol 7(1–2):163–172Google Scholar
  6. Buchholz S, Marsi E (2006) CoNLL-X shared task on multilingual dependency parsing. In: Proceedings of CONLL, New York, NY, pp 149–164Google Scholar
  7. Butt M, Dyvik H, King TH, Masuichi H, Rohrer C (2002) The parallel grammar project. In: Proceedings of the workshop on grammar engineering and evaluation, Taipei, pp 1–7Google Scholar
  8. Can F, Koçberber S, Balçık E, Kaynak C, Öcalan HC, Vursavaş OM (2008) Information retrieval on Turkish texts. J Am Soc Inf Sci Technol 59(3):407–421Google Scholar
  9. Çetinoğlu Ö (2009) A large scale LFG grammar for Turkish. PhD thesis, Sabancı University, IstanbulGoogle Scholar
  10. Chelba C, Hazen TJ, Saraçlar M (2008) Retrieval and browsing of spoken content. IEEE Signal Process Mag 25(3):39–49Google Scholar
  11. Durgar-El Kahlout İ (2009) A prototype English-Turkish statistical machine translation system. PhD thesis, Sabancı University, IstanbulGoogle Scholar
  12. Durgar-El Kahlout İ, Oflazer K (2010) Exploiting morphology and local word reordering in English-to-Turkish phrase-based statistical machine translation. IEEE Trans Audio Speech Lang Process 18(6):1313–1322Google Scholar
  13. Eryiğit G, Oflazer K (2006) Statistical dependency parsing of Turkish. In: Proceedings of EACL, Trento, pp 89–96Google Scholar
  14. Eryiğit G, Nivre J, Oflazer K (2008) Dependency parsing of Turkish. Comput Linguist 34(3):357–389Google Scholar
  15. Göksel A, Kerslake C (2005) Turkish: a comprehensive grammar. Routledge, LondonGoogle Scholar
  16. Hakkani-Tür DZ, Oflazer K, Tür G (2002) Statistical morphological disambiguation for agglutinative languages. Comput Hum 36(4):381–410Google Scholar
  17. Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of ACL, Prague, pp 177–180Google Scholar
  18. Koskenniemi K (1983) Two-level morphology: a general computational model for word-form recognition and production. PhD thesis, University of Helsinki, HelsinkiGoogle Scholar
  19. Külekçi MO (2006) Statistical morphological disambiguation with application to disambiguation of pronunciations in Turkish. PhD thesis, Sabancı University, IstanbulGoogle Scholar
  20. Nivre J, Hall J, Nilsson J, Chanev A, Eryiğit G, Kübler S, Marinov S, Marsi E (2007) MaltParser: a language-independent system for data-driven dependency parsing. Nat Lang Eng 13(2):95–135Google Scholar
  21. Oflazer K (1994) Two-level description of Turkish morphology. Lit Linguist Comput 9(2):137–148Google Scholar
  22. Oflazer K (1996) Error-tolerant finite-state recognition with applications to morphological analysis and spelling correction. Comput Linguist 22(1):73–99Google Scholar
  23. Oflazer K (2008) Statistical machine translation into a morphologically complex language. In: Proceedings of CICLING, Haifa, pp 376–387Google Scholar
  24. Oflazer K (2014) Turkish and its challenges for language processing. Lang Resour Eval 48(4):639–653Google Scholar
  25. Oflazer K, Inkelas S (2006) The architecture and the implementation of a finite state pronunciation lexicon for Turkish. Comput Speech Lang 20:80–106Google Scholar
  26. Oflazer K, Kuruöz İ (1994) Tagging and morphological disambiguation of Turkish text. In: Proceedings of ANLP, Stuttgart, pp 144–149Google Scholar
  27. Oflazer K, Tür G (1996) Combining hand-crafted rules and unsupervised learning in constraint-based morphological disambiguation. In: Proceedings of EMNLP-VLC, Philadelphia, PAGoogle Scholar
  28. Oflazer K, Say B, Hakkani-Tür DZ, Tür G (2003) Building a Turkish Treebank. In: Treebanks: building and using parsed corpora. Kluwer Academic Publishers, BerlinGoogle Scholar
  29. Parlak S, Saraçlar M (2012) Performance analysis and improvement of Turkish broadcast news retrieval. IEEE Trans Audio Speech Lang Process 20(3):731–741Google Scholar
  30. Sak H (2011) Integrating morphology into automatic speech recognition: morpholexical and discriminative language models for Turkish. PhD thesis, Boğaziçi University, IstanbulGoogle Scholar
  31. Sak H, Güngör T, Saraçlar M (2007) Morphological disambiguation of Turkish text with perceptron algorithm. In: Proceedings of CICLING, Mexico City, pp 107–118Google Scholar
  32. Sak H, Güngör T, Saraçlar M (2011) Resources for Turkish morphological processing. Lang Resour Eval 45(2):249–261Google Scholar
  33. Saraçlar M (2012) Turkish broadcast news speech and transcripts (LDC2012S06). Resource available from Linguistic Data ConsortiumGoogle Scholar
  34. Stamou S, Oflazer K, Pala K, Christoudoulakis D, Cristea D, Tufis D, Koeva S, Totkov G, Dutoit D, Grigoriadou M (2002) Balkanet: a multilingual semantic network for Balkan languages. In: Proceedings of the first global WordNet conference, MysoreGoogle Scholar
  35. Wickwire DE (1987) The Sevmek Thesis, a grammatical analysis of the Turkish verb system illustrated by the verb sevmek-to love. Master’s thesis, Pacific Western University, San Diego, CAGoogle Scholar
  36. Yeniterzi R, Oflazer K (2010) Syntax-to-morphology mapping in factored phrase-based statistical machine translation from English to Turkish. In: Proceedings of ACL, Uppsala, pp 454–464Google Scholar
  37. Yuret D, Türe F (2006) Learning morphological disambiguation rules for Turkish. In: Proceedings of NAACL-HLT, New York, NY, pp 328–334Google Scholar
  38. Zeyrek D, Turan ÜD, Bozşahin C, Çakıcı R, Sevdik-Çallı A, Demirşahin I, Aktaş B, Yalçınkaya İ, Ögel H (2009) Annotating subordinators in the Turkish Discourse Bank. In: Proceedings of the linguistic annotation workshop, Singapore, pp 44–47Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Carnegie Mellon University QatarDoha-Education CityQatar
  2. 2.Boğaziçi UniversityIstanbulTurkey

Personalised recommendations