Advertisement

Turkish Speech Recognition

  • Ebru Arısoy
  • Murat SaraçlarEmail author
Chapter
  • 399 Downloads
Part of the Theory and Applications of Natural Language Processing book series (NLP)

Abstract

Automatic speech recognition (ASR) is one of the most important applications of speech and language processing, as it forms the bridge between spoken and written language processing. This chapter presents an overview of the foundations of ASR, followed by a summary of Turkish language resources for ASR and a review of various Turkish ASR systems. Language resources include acoustic and text corpora as well as linguistic tools such as morphological parsers, morphological disambiguators, and dependency parsers, discussed in more detail in other chapters. Turkish ASR systems vary in the type and amount of data used for building the models. The focus of most of the research for Turkish ASR is the language modeling component covered in Chap.  4.

References

  1. Arısoy E (2004) Turkish dictation system for radiology and broadcast news applications. Master’s thesis, Boğaziçi University, IstanbulGoogle Scholar
  2. Arısoy E (2009) Statistical and discriminative language modeling for Turkish large vocabulary continuous speech recognition. PhD thesis, Boğaziçi University, IstanbulGoogle Scholar
  3. Arısoy E, Saraçlar M (2009) Lattice extension and vocabulary adaptation for Turkish LVCSR. IEEE Trans Audio Speech Lang Process 17(1):163–173Google Scholar
  4. Arısoy E, Dutağacı H, Arslan LM (2006) A unified language model for large vocabulary continuous speech recognition of Turkish. Signal Process 86(10):2844–2862Google Scholar
  5. Arısoy E, Can D, Parlak S, Sak H, Saraçlar M (2009) Turkish broadcast news transcription and retrieval. IEEE Trans Audio Speech Lang Process 17(5):874–883Google Scholar
  6. Arısoy E, Sainath TN, Kingsbury B, Ramabhadran B (2012) Deep neural network language models. In: Proceedings of the NAACL-HLT workshop: will we ever really replace the n-gram model? On the future of language modeling for HLT, Montreal, pp 20–28Google Scholar
  7. Bengio Y, Ducharme R, Vincent P, Jauvin C (2003) A neural probabilistic language model. J Mach Learn Res 3:1137–1155Google Scholar
  8. Çarkı K, Geutner P, Schultz T (2000) Turkish LVCSR: towards better speech recognition for agglutinative languages. In: Proceedings of ICASSP, Istanbul, pp 1563–1566Google Scholar
  9. Çetinoğlu Ö (2000) Prolog-based natural language processing infrastructure for Turkish. Master’s thesis, Boğaziçi University, IstanbulGoogle Scholar
  10. Çiloğlu T, Acar D, Tokatlı A (2004) Orientel Turkish: telephone speech database description and notes on the experience. In: Proceedings of INTERSPEECH, Jeju, pp 2725–2728Google Scholar
  11. Creutz M, Lagus K (2005) Unsupervised morpheme segmentation and morphology induction from text corpora using Morfessor 1.0. Publications in Computer and Information Science Report A81, Helsinki University of Technology, HelsinkiGoogle Scholar
  12. Dutağacı H (2002) Statistical language models for large vocabulary continuous speech recognition of Turkish. Master’s thesis, Boğaziçi University, IstanbulGoogle Scholar
  13. Erdoğan H, Büyük O, Oflazer K (2005) Incorporating language constraints in sub-word based speech recognition. In: Proceedings of ASRU, San Juan, PR, pp 98–103Google Scholar
  14. Eryiğit G, Nivre J, Oflazer K (2008) Dependency parsing of Turkish. Comput Linguist 34(3):357–389Google Scholar
  15. Fromkin V, Rodman R, Hyams N (2003) An introduction to language. Thomson Heinle, Boston, MAGoogle Scholar
  16. Geutner P, Finke M, Scheytt P, Waibel A, Wactlar H (1998a) Transcribing multilingual broadcast news using hypothesis driven lexical adaptation. In: Proceedings of DARPA broadcast news workshop, Herndon, VAGoogle Scholar
  17. Geutner P, Finke M, Waibel A (1998b) Phonetic-distance-based hypothesis driven lexical adaptation for transcribing multilingual broadcast news. In: Proceedings of ICSLP, Sydney, pp 2635–2638Google Scholar
  18. Geutner P, Finke M, Waibel A (1999) Selection criteria for hypothesis driven lexical adaptation. In: Proceedings of ICASSP, Phoenix, AZ, pp 617–619Google Scholar
  19. Hacıoğlu K, Pellom B, Çiloğlu T, Öztürk Ö, Kurimo M, Creutz M (2003) On lexicon creation for Turkish LVCSR. In: Proceedings of EUROSPEECH, Geneva, pp 1165–1168Google Scholar
  20. Hakkani-Tür DZ, Oflazer K, Tür G (2002) Statistical morphological disambiguation for agglutinative languages. Comput Hum 36(4):381–410Google Scholar
  21. Haznedaroğlu A, Arslan LM (2011) Confidence measures for Turkish call center conversations. In: Proceedings of INTERSPEECH, Florence, pp 1957–1960Google Scholar
  22. Haznedaroğlu A, Arslan LM (2014) Language model adaptation for automatic call transcription. In: Proceedings of ICASSP, Florence, pp 4102–4106Google Scholar
  23. Haznedaroğlu A, Arslan LM, Büyük O, Eden M (2010) Turkish LVCSR system for call center conversations. In: Proceedings of IEEE signal processing and communications applications conference, Diyarbakır, pp 372–375Google Scholar
  24. Hinton G, Deng L, Yu D, Dahl GE, Mohamed Ar, Jaitly N, Senior A, Vanhoucke V, Nguyen P, Sainath TN, Kingsbury B (2012) Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process Mag 29(6):82–97Google Scholar
  25. Huang X, Acero A, Hon HW (2001) Spoken language processing: a guide to theory, algorithm and system development. Prentice Hall, Upper Saddle River, NJGoogle Scholar
  26. Ircing P, Psutka J (2001) Two-pass recognition of Czech speech using adaptive vocabulary. In: Proceedings of conference on text, speech and dialogue, Zelezna Ruda, pp 273–277Google Scholar
  27. Jelinek F (1997) Statistical methods for speech recognition. The MIT Press, Cambridge, MAGoogle Scholar
  28. Jurafsky D, Martin JH (2000) Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition. Prentice Hall, Upper Saddle River, NJGoogle Scholar
  29. Levenshtein V (1966) Binary codes capable of correcting deletions, insertions and reversals. Sov Phys Dokl 10:707Google Scholar
  30. Mengüşoğlu E, Deroo O (2001) Turkish LVCSR: database preparation and language modeling for an agglutinative language. In: Proceedings of ICASSP, Salt Lake City, UT, pp 4018–4021Google Scholar
  31. Mikolov T, Karafiat M, Burget L, Cernocky J, Khudanpur S (2010) Recurrent neutral network based language model. In: Proceedings of INTERSPEECH, Saint-Malo, pp 1045–1048Google Scholar
  32. Mohri M, Pereira F, Riley M (2002) Weighted finite-state transducers in speech recognition. Comput Speech Lang 16(1):69–88Google Scholar
  33. Oflazer K (1994) Two-level description of Turkish morphology. Lit Linguist Comput 9(2):137–148Google Scholar
  34. Oflazer K, Inkelas S (2006) The architecture and the implementation of a finite state pronunciation lexicon for Turkish. Comput Speech Lang 20:80–106Google Scholar
  35. Pellom BL (2001) Sonic: The University of Colorado continuous speech recognizer. Tech. Rep. TR-CSLR-01, University of Colorado, Boulder, COGoogle Scholar
  36. Rabiner L (1989) A tutorial on HMM and selected applications in speech recognition. Proc IEEE 77(2):257–286Google Scholar
  37. Sak H (2011) Integrating morphology into automatic speech recognition: morpholexical and discriminative language models for Turkish. PhD thesis, Boğaziçi University, IstanbulGoogle Scholar
  38. Sak H, Güngör T, Saraçlar M (2007) Morphological disambiguation of Turkish text with perceptron algorithm. In: Proceedings of CICLING, Mexico City, pp 107–118Google Scholar
  39. Sak H, Güngör T, Saraçlar M (2011) Resources for Turkish morphological processing. Lang Resour Eval 45(2):249–261Google Scholar
  40. Sak H, Saraçlar M, Güngör T (2012) Morpholexical and discriminative language models for Turkish automatic speech recognition. IEEE Trans Audio Speech Lang Process 20(8):2341–2351Google Scholar
  41. Salor Ö (2005) Voice transformation and development of related speech analysis tools for Turkish. PhD thesis, Middle East Technical University, AnkaraGoogle Scholar
  42. Salor Ö, Pellom BL, Demirekler M (2003) Implementation and evaluation of a text-to-speech synthesis system for Turkish. In: Proceedings of EUROSPEECH, GenevaGoogle Scholar
  43. Salor Ö, Pellom BL, Çiloğlu T, Demirekler M (2007) Turkish speech corpora and recognition tools developed by porting sonic: towards multilingual speech recognition. Comput Speech Lang 21(4):580–593Google Scholar
  44. Saon G, Ramabhadran B, Zweig G (2006) On the effect of word error rate on automated quality monitoring. In: Proceedings of IEEE spoken language technology workshop, Palm Beach, pp 106–109Google Scholar
  45. Schalkwyk J, Beeferman D, Beaufays F, Byrne W, Chelba C, Cohen M, Kamvar M, Strope B (2010) “Your Word is my Command”: Google search by voice: a case study. In: Neustein A (ed) Advances in speech recognition: mobile environments, call centers and clinics, Springer, Boston, MA, pp 61–90Google Scholar
  46. Schultz T (2002) Globalphone: a multilingual speech and text database developed at Karlsruhe University. In: Proceedings of ICSLP, Denver, COGoogle Scholar
  47. Schultz T, Waibel A (2001) Language-independent and language-adaptive acoustic modeling for speech recognition. Speech Commun 35:31–51Google Scholar
  48. Schultz T, Vu NT, Schlippe T (2013) Globalphone: a multilingual text and speech database in 20 languages. In: Proceedings of ICASSP, VancouverGoogle Scholar
  49. Schwenk H (2007) Continuous space language models. Comput Speech Lang 21(3):492–518Google Scholar
  50. Stolcke A (1998) Entropy-based pruning of backoff language models. In: Proceedings of DARPA broadcast news workshop, Herndon, VA, pp 270–274Google Scholar
  51. Stolcke A (2002) SRILM – An extensible language modeling toolkit. In: Proceedings of ICSLP, Denver, CO, vol 2, pp 901–904Google Scholar
  52. Tuske Z, Golik P, Schluter R, Ney H (2014) Acoustic modeling with deep neural networks using raw time signal for LVCSR. In: Proceedings of INTERSPEECH, Singapore, pp 890–894Google Scholar
  53. Yuret D, Türe F (2006) Learning morphological disambiguation rules for Turkish. In: Proceedings of NAACL-HLT, New York, NY, pp 328–334Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.MEF UniversityIstanbulTurkey
  2. 2.Boğaziçi UniversityIstanbulTurkey

Personalised recommendations