Advertisement

\(\hbox {F}_0\) Post-Stress Rise Trends Consideration in Unit Selection TTS

  • Markéta Jůzová
  • Jan Volín
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11107)

Abstract

In spoken Czech language, the stress and post-stress syllables in human speech are usually characterized by an increase in fundamental frequency \(\hbox {F}_0\) (except for phrase-final stress groups). In unit selection text-to-speech systems, where no contour of \(\hbox {F}_0\) is generated to be followed, however, the \(\hbox {F}_0\) behaviour is usually tended very vaguely. The paper presents an experiment of making the unit selection TTS to follow the trends of fundamental frequency rise in synthesized speech to achieve higher naturalness and overall quality of speech synthesis itself.

Keywords

Unit selection Stress and post-stress syllables Fo rise 

References

  1. 1.
    Chlumský, J.: Česká kvantita, melodie a přízvuk. Studia ČSAV. Československá akademie věd, Praha (1928)Google Scholar
  2. 2.
    Google: Reaper github. https://github.com/google/REAPER
  3. 3.
    Hála, B.: Rytmická, výstavba prozaického textu. Studia ČSAV. Československá akademie věd, Praha (1962)Google Scholar
  4. 4.
    Jůzová, M., Tihelka, D., Skarnitzl, R.: Last syllable unit penalization in unit selection TTS. In: Ekštein, K., Matoušek, V. (eds.) TSD 2017. LNCS (LNAI), vol. 10415, pp. 317–325. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-64206-2_36CrossRefGoogle Scholar
  5. 5.
    Jůzová, M., Tihelka, D., Volín, J.: On the extension of the formal prosody model for TTS. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2018. LNCS, vol. 11107, pp. 351–359. Springer, Cham (2018)CrossRefGoogle Scholar
  6. 6.
    Legát, M.: Impact of phonetic context mismatches on quality of vowel concatenations. In: Proceedings of 2012 IEEE 11th International Conference on Signal Processing, Beijing, China, pp. 523–526 (2012)Google Scholar
  7. 7.
    Legát, M., Matoušek, J., Tihelka, D.: A robust multi-phase pitch-mark detection algorithm. In: Proceedings of Interspeech 2007, pp. 1641–1644 (2007)Google Scholar
  8. 8.
    Legát, M., Matoušek, J., Tihelka, D.: On the detection of pitch marks using a robust multi-phase algorithm. Speech Commun. 53(4), 552–566 (2011)CrossRefGoogle Scholar
  9. 9.
    Matoušek, J., Legát, M.: Is unit selection aware of audible artifacts? In: Proceedings of the 8th Speech Synthesis Workshop SSW 2013, pp. 267–271. ISCA, Barcelona (2013)Google Scholar
  10. 10.
    Matoušek, J., Tihelka, D.: Annotation errors detection in TTS corpora. In: Proceedings of INTERSPEECH 2013, Lyon, France, pp. 1511–1515 (2013)Google Scholar
  11. 11.
    Matoušek, J., Tihelka, D.: Anomaly-based annotation errors detection in TTS corpora. In: Proceedings of INTERSPEECH 2015, Dresden, Germany, pp. 314–318 (2015)Google Scholar
  12. 12.
    Matoušek, J., Tihelka, D.: Classification-based detection of glottal closure instants from speech signals. Proc. INTERSPEECH 2017, 3053–3057 (2017)Google Scholar
  13. 13.
    Palková, Z., Volín, J.: The role of \(\text{F}_0\) contours in determining foot boundaries in Czech. In: Proceedings of the 15th International Congress of Phonetic Sciences, vol. 2, pp. 1783–1786. UAB & IPA, Barcelona (2011)Google Scholar
  14. 14.
    Qian, Y., Soong, F.K., Yan, Z.J.: A unified trajectory tiling approach to high quality speech rendering. IEEE Trans. Audio Speech Lang. Process. 21(2), 280–290 (2013)CrossRefGoogle Scholar
  15. 15.
    Romportl, J.: Structural data-driven prosody model for TTS synthesis. In: Proceedings of the Speech Prosody 2006 Conference, pp. 549–552. TUD Press, Dresden (2006)Google Scholar
  16. 16.
    Romportl, J., Matoušek, J.: Formal prosodic structures and their application in NLP. In: Matoušek, V., Mautner, P., Pavelka, T. (eds.) TSD 2005. LNCS (LNAI), vol. 3658, pp. 371–378. Springer, Heidelberg (2005).  https://doi.org/10.1007/11551874_48CrossRefGoogle Scholar
  17. 17.
    Silverman, K.E.A., et al.: ToBI: a standard for labeling English prosody. In: Proceedings of ICSLP 1992, pp. 867–870. ISCA, Banff (1992)Google Scholar
  18. 18.
    Taylor, P.: Text-to-Speech Synthesis, 1st edn. Cambridge University Press, New York (2009)CrossRefGoogle Scholar
  19. 19.
    Tihelka, D.: Symbolic prosody driven unit selection for highly natural synthetic speech. In: Proceedings of INTERSPEECH 2005, pp. 2525–2528. ISCA, Bonn (2005)Google Scholar
  20. 20.
    Tihelka, D., Hanzlíček, Z., Jůzová, M., Vít, J., Matoušek, J., Grůber, M.: Current state of text-to-speech system ARTIC: A decade of research on the field of speech technologies. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2018. LNCS, vol. 11107, pp. 369–378. Springer, Cham (2018)CrossRefGoogle Scholar
  21. 21.
    Tihelka, D., Kala, J., Matoušek, J.: Enhancements of Viterbi search for fast unit selection synthesis. In: Proceedings of INTERSPEECH 2010, pp. 174–177. ISCA, Makuhari (2010)Google Scholar
  22. 22.
    Tihelka, D., Matoušek, J.: Unit selection and its relation to symbolic prosody: a new approach. In: Proceedings of INTERSPEECH 2006, vol. 1, pp. 2042–2045. ISCA, Bonn (2006)Google Scholar
  23. 23.
    Tihelka, D., Matoušek, J., Hanzlíček, Z.: Modelling F<Subscript>0</Subscript> dynamics in unit selection based speech synthesis. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2014. LNCS (LNAI), vol. 8655, pp. 457–464. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10816-2_55CrossRefGoogle Scholar
  24. 24.
    Volín, J.: Z intonace čtených zpravodajství: výška první slabiky v taktu. Čeština doma a ve světě 1–2, 89–96 (2008)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.New Technologies for the Information Society and Department of Cybernetics, Faculty of Applied SciencesUniversity of West BohemiaPilsenCzech Republic
  2. 2.Institute of Phonetics, Faculty of ArtsCharles UniversityPragueCzech Republic

Personalised recommendations