Skip to main content

Turkish and Its Challenges for Language and Speech Processing

  • Chapter
  • First Online:
Turkish Natural Language Processing

Abstract

We present a short survey and exposition of some of the important aspects of Turkish that have proved to be interesting and challenging for natural language and speech processing. Most of the challenges stem from the complex morphology of Turkish and how morphology interacts with syntax. Finally we provide a short overview of the major tools and resources developed for Turkish over the last two decades. (Parts of this chapter were previously published as Oflazer (Lang Resour Eval 48(4):639–653, 2014).)

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    These numbers were counted by using the xfst, the Xerox finite state tool (Beesley and Karttunen 2003), by filtering through composition by restricting output by the respective root words and with the number of symbols marking a derivational morpheme, and then counting the number of possible words.

  2. 2.

    See Wickwire (1987) for an interesting take on this.

  3. 3.

    It turns out that there are a couple of suffixes that can at least theoretically be used iteratively. The causative morpheme is one such morpheme, but in practice up to three could be used and even then it is hard to track who is doing what to whom.

  4. 4.

    One constraint usually mentioned is that indefinite (and nominative marked) direct objects move with the verb, but there are valid violations of that observed in speech (Sarah Kennelly, personal communication).

  5. 5.

    Although we have written out the root word explicitly here, whenever convenient we will assume that the root word is part of the first inflectional group.

  6. 6.

    uzak is far/distant; the morphological features other than the obvious part-of-speech features are: +Become: become verb, +Caus: causative verb, +Pass: passive verb, +Pos: Positive Polarity, +FutPart: Derived future participle, +Pnon: no possessive agreement.

  7. 7.

    Here we show surface dependency relations, but going from the dependent to the head.

  8. 8.

    The pre-trained MaltParser model and configuration files for Turkish can be downloaded from https://web.itu.edu.tr/gulsenc/TurkishDepModel.html (Accessed Sept. 14, 2017).

  9. 9.

    See also ParGram/ParSem. An international collaboration on LFG-based grammar and semantics development: https://pargram.b.uib.no (Accessed Sept. 14, 2017).

  10. 10.

    Available at https://web.itu.edu.tr/gulsenc/treebanks.html (Accessed Sept. 14, 2017).

References

  • Aksan Y, Aksan M, Koltuksuz A, Sezer T, Mersinli Ü, Demirhan UU, Yılmazer H, Kurtoğlu Ö, Öz S, Yıldız İ (2012) Construction of the Turkish National Corpus (TNC). In: Proceedings of LREC, Istanbul, pp 3223–3227

    Google Scholar 

  • Arısoy E (2009) Statistical and discriminative language modeling for Turkish large vocabulary continuous speech recognition. PhD thesis, Boğaziçi University, Istanbul

    Google Scholar 

  • Arısoy E, Can D, Parlak S, Sak H, Saraçlar M (2009) Turkish broadcast news transcription and retrieval. IEEE Trans Audio Speech Lang Process 17(5):874–883

    Google Scholar 

  • Beesley KR, Karttunen L (2003) Finite state morphology. CSLI Publications, Stanford University, Stanford

    Google Scholar 

  • Bilgin O, Çetinoğlu Ö, Oflazer K (2004) Building a Wordnet for Turkish. Rom J Inf Sci Technol 7(1–2):163–172

    Google Scholar 

  • Buchholz S, Marsi E (2006) CoNLL-X shared task on multilingual dependency parsing. In: Proceedings of CONLL, New York, NY, pp 149–164

    Google Scholar 

  • Butt M, Dyvik H, King TH, Masuichi H, Rohrer C (2002) The parallel grammar project. In: Proceedings of the workshop on grammar engineering and evaluation, Taipei, pp 1–7

    Google Scholar 

  • Can F, Koçberber S, Balçık E, Kaynak C, Öcalan HC, Vursavaş OM (2008) Information retrieval on Turkish texts. J Am Soc Inf Sci Technol 59(3):407–421

    Google Scholar 

  • Çetinoğlu Ö (2009) A large scale LFG grammar for Turkish. PhD thesis, Sabancı University, Istanbul

    Google Scholar 

  • Chelba C, Hazen TJ, Saraçlar M (2008) Retrieval and browsing of spoken content. IEEE Signal Process Mag 25(3):39–49

    Google Scholar 

  • Durgar-El Kahlout İ (2009) A prototype English-Turkish statistical machine translation system. PhD thesis, Sabancı University, Istanbul

    Google Scholar 

  • Durgar-El Kahlout İ, Oflazer K (2010) Exploiting morphology and local word reordering in English-to-Turkish phrase-based statistical machine translation. IEEE Trans Audio Speech Lang Process 18(6):1313–1322

    Google Scholar 

  • Eryiğit G, Oflazer K (2006) Statistical dependency parsing of Turkish. In: Proceedings of EACL, Trento, pp 89–96

    Google Scholar 

  • Eryiğit G, Nivre J, Oflazer K (2008) Dependency parsing of Turkish. Comput Linguist 34(3):357–389

    Google Scholar 

  • Göksel A, Kerslake C (2005) Turkish: a comprehensive grammar. Routledge, London

    Google Scholar 

  • Hakkani-Tür DZ, Oflazer K, Tür G (2002) Statistical morphological disambiguation for agglutinative languages. Comput Hum 36(4):381–410

    Google Scholar 

  • Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of ACL, Prague, pp 177–180

    Google Scholar 

  • Koskenniemi K (1983) Two-level morphology: a general computational model for word-form recognition and production. PhD thesis, University of Helsinki, Helsinki

    Google Scholar 

  • Külekçi MO (2006) Statistical morphological disambiguation with application to disambiguation of pronunciations in Turkish. PhD thesis, Sabancı University, Istanbul

    Google Scholar 

  • Nivre J, Hall J, Nilsson J, Chanev A, Eryiğit G, Kübler S, Marinov S, Marsi E (2007) MaltParser: a language-independent system for data-driven dependency parsing. Nat Lang Eng 13(2):95–135

    Google Scholar 

  • Oflazer K (1994) Two-level description of Turkish morphology. Lit Linguist Comput 9(2):137–148

    Google Scholar 

  • Oflazer K (1996) Error-tolerant finite-state recognition with applications to morphological analysis and spelling correction. Comput Linguist 22(1):73–99

    Google Scholar 

  • Oflazer K (2008) Statistical machine translation into a morphologically complex language. In: Proceedings of CICLING, Haifa, pp 376–387

    Google Scholar 

  • Oflazer K (2014) Turkish and its challenges for language processing. Lang Resour Eval 48(4):639–653

    Google Scholar 

  • Oflazer K, Inkelas S (2006) The architecture and the implementation of a finite state pronunciation lexicon for Turkish. Comput Speech Lang 20:80–106

    Google Scholar 

  • Oflazer K, Kuruöz İ (1994) Tagging and morphological disambiguation of Turkish text. In: Proceedings of ANLP, Stuttgart, pp 144–149

    Google Scholar 

  • Oflazer K, Tür G (1996) Combining hand-crafted rules and unsupervised learning in constraint-based morphological disambiguation. In: Proceedings of EMNLP-VLC, Philadelphia, PA

    Google Scholar 

  • Oflazer K, Say B, Hakkani-Tür DZ, Tür G (2003) Building a Turkish Treebank. In: Treebanks: building and using parsed corpora. Kluwer Academic Publishers, Berlin

    Google Scholar 

  • Parlak S, Saraçlar M (2012) Performance analysis and improvement of Turkish broadcast news retrieval. IEEE Trans Audio Speech Lang Process 20(3):731–741

    Google Scholar 

  • Sak H (2011) Integrating morphology into automatic speech recognition: morpholexical and discriminative language models for Turkish. PhD thesis, Boğaziçi University, Istanbul

    Google Scholar 

  • Sak H, Güngör T, Saraçlar M (2007) Morphological disambiguation of Turkish text with perceptron algorithm. In: Proceedings of CICLING, Mexico City, pp 107–118

    Google Scholar 

  • Sak H, Güngör T, Saraçlar M (2011) Resources for Turkish morphological processing. Lang Resour Eval 45(2):249–261

    Google Scholar 

  • Saraçlar M (2012) Turkish broadcast news speech and transcripts (LDC2012S06). Resource available from Linguistic Data Consortium

    Google Scholar 

  • Stamou S, Oflazer K, Pala K, Christoudoulakis D, Cristea D, Tufis D, Koeva S, Totkov G, Dutoit D, Grigoriadou M (2002) Balkanet: a multilingual semantic network for Balkan languages. In: Proceedings of the first global WordNet conference, Mysore

    Google Scholar 

  • Wickwire DE (1987) The Sevmek Thesis, a grammatical analysis of the Turkish verb system illustrated by the verb sevmek-to love. Master’s thesis, Pacific Western University, San Diego, CA

    Google Scholar 

  • Yeniterzi R, Oflazer K (2010) Syntax-to-morphology mapping in factored phrase-based statistical machine translation from English to Turkish. In: Proceedings of ACL, Uppsala, pp 454–464

    Google Scholar 

  • Yuret D, Türe F (2006) Learning morphological disambiguation rules for Turkish. In: Proceedings of NAACL-HLT, New York, NY, pp 328–334

    Google Scholar 

  • Zeyrek D, Turan ÜD, Bozşahin C, Çakıcı R, Sevdik-Çallı A, Demirşahin I, Aktaş B, Yalçınkaya İ, Ögel H (2009) Annotating subordinators in the Turkish Discourse Bank. In: Proceedings of the linguistic annotation workshop, Singapore, pp 44–47

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kemal Oflazer .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Oflazer, K., Saraçlar, M. (2018). Turkish and Its Challenges for Language and Speech Processing. In: Oflazer, K., Saraçlar, M. (eds) Turkish Natural Language Processing. Theory and Applications of Natural Language Processing. Springer, Cham. https://doi.org/10.1007/978-3-319-90165-7_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-90165-7_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-90163-3

  • Online ISBN: 978-3-319-90165-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics