A Small Footprint Hybrid Statistical and Unit Selection Text-to-Speech Synthesis System for Turkish

Conference paper


Unit selection based text-to-speech synthesis (TTS) can generate high quality speech. However, The HMM-based text-to-speech (HTS) has also advantages such as the lack of spurious errors that are observed in the unit selection scheme. Another advantage is the small memory footprint requirement. Here, we propose a novel hybrid statistical/unit selection TTS system for agglutinative languages that aims at improving the quality of the baseline HTS system while keeping the memory footprint small. Listeners preferred the hybrid system over a state-of-the-art HTS baseline system in the A/B preference tests.


Speech synthesis Hybrid TTS HMM-based TTS Turkish TTS Small memory footprint Agglutinative languages 


  1. 1.
    Lu, H., Ling, Z.H., Lei, M., Wang, C.C., Zhao, H.H., Chen,L.H., Hu,Y. Dai,L.R., Wang, R.H.: The USTC system for Blizzard challenge 2009. In: Blizzard Challenge Workshop (2009)Google Scholar
  2. 2.
    Kawai, H., Toda, T., Ni, J., Tsuzaki, M., Tokuda, K.: XIMERA: a new TTS from ATR based on corpus-based technologies. In: Fifth ISCA Workshop on Speech Synthesis (2004)Google Scholar
  3. 3.
    Rouibia, S., Rosec, O.: Unit selection for speech synthesis based on a new acoustic target cost. In: INTERSPEECH, pp. 2565–2568. (2005)Google Scholar
  4. 4.
    Qian, Y., Yan, Z.J., Wu, Y., Soong, F.K., Zhuang, X., Kong, S.: An HMM trajectory tiling (HTT) approach to high quality TTS. In: INTERSPEECH, pp. 422–425. (2010)Google Scholar
  5. 5.
    Tiomkin, S., Malah, D., Shechtman, S., Kons, Z.: A hybrid text-to-speech system that combines concatenative and statistical synthesis units. In: Audio, Speech, and Language Processing, IEEE Transactions on, vol. pp. 99. (2010)Google Scholar
  6. 6.
    Pollet, V., Breen, A.: Synthesis by generation and concatenation of multiform segments. In: INTERSPEECH, pp. 1825–1828. (2008)Google Scholar
  7. 7.
    Plumpe, M., Acero, A., Hon, H.W., Huang, X.: HMM-based smoothing for concatenative speech synthesis. In: Fifth International Conference on Spoken Language Processing (1998)Google Scholar
  8. 8.
    Oflazer, K., Inkelas, S.: A finite state pronunciation lexicon for Turkish. In: Proceedings of the EACL Workshop on Finite State Methods in NLP, vol. 82, pp. 900–918. Budapest (2003)Google Scholar
  9. 9.
    Black, A.W., Zen, H., Tokuda, K.: Statistical parametric speech synthesis. In: Proceedings of ICASSP, vol. 4, pp. 1229–1232. (2007)Google Scholar

Copyright information

© Springer-Verlag London Limited  2011

Authors and Affiliations

  1. 1.Ozyegin UniversityIstanbulTurkey

Personalised recommendations