Advertisement

The Role of Nasal Contexts on Quality of Vowel Concatenations

  • Milan Legát
  • Radek Skarnitzl
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7499)

Abstract

This paper deals with the traditional problem of occurrence of audible discontinuities at concatenation points at diphone boundaries in the concatenative speech synthesis. We present results of an analysis of effects of nasal context mismatches on the quality of concatenations in five short Czech vowels. The study was conducted with two voices (one male and one female), and the results suggest that the female voice vowels /a/, /e/ and /o/ are inclined to concatenation discontinuities due to nasalized contexts.

Keywords

speech synthesis unit selection concatenation cost nasality phase mismatch pitch marks 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Klabbers, E., Veldhuis, R.: Reducing audible spectral discontinuities. IEEE Transactions on Speech and Audio Processing 9, 39–51 (2001)CrossRefGoogle Scholar
  2. 2.
    Bellegarda, J.R.: A novel discontinuity metric for unit selection text-to-speech synthesis. In: SSW5 2004, Pittsburgh, PA, USA, pp. 133–138 (2004)Google Scholar
  3. 3.
    Vepa, J.: Join cost for unit selection speech synthesis. Ph.D. thesis, University of Edinburgh (2004)Google Scholar
  4. 4.
    Syrdal, A.K.: Phonetic effects on listener detection of vowel concatenation. In: EURO-SPEECH 2001, Aalborg, Denmark, pp. 979–982 (2001)Google Scholar
  5. 5.
    Syrdal, A.K., Conkie, A.: Perceptually-based data driven join costs: comparing join types. In: INTERSPEECH 2005, Lisbon, Portugal, pp. 2813–2816 (2005)Google Scholar
  6. 6.
    Kawai, H., Tsuzaki, M.: Acoustic measures vs. phonetic features as predictors of audible discontinuity in concatenative speech synthesis. In: ICSLP 2002, pp. 2621–2624. Denver, Colorado (2002)Google Scholar
  7. 7.
    Fujimura, O., Lindqvist, J.: Sweep-tone measurements of vocal-tract characteristics. J. Acoust. Soc. Am. 49, 541–558 (1971)CrossRefGoogle Scholar
  8. 8.
    Fant, G.: Acoustic theory of speech production. Mouton, The Hague (1960)Google Scholar
  9. 9.
    House, A.S., Stevens, K.N.: Analog studies of the nasalization of vowels. J. Speech Hearing Disorders 21, 218–232 (1956)Google Scholar
  10. 10.
    Hawkins, S., Stevens, K.N.: Acoustic and perceptual correlates of the non-nasal–nasal distinction for vowels. J. Acoust. Soc. Am. 77, 1560–1575 (1985)CrossRefGoogle Scholar
  11. 11.
    Legát, M., Matoušek, J.: Design of the Test Stimuli for the Evaluation of Concatenation Cost Functions. In: Matoušek, V., Mautner, P. (eds.) TSD 2009. LNCS, vol. 5729, pp. 339–346. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  12. 12.
    Legát, M., Matoušek, J.: Analysis of Data Collected in Listening Tests for the Purpose of Evaluation of Concatenation Cost Functions. In: Habernal, I., Matoušek, V. (eds.) TSD 2011. LNCS, vol. 6836, pp. 33–40. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  13. 13.
    Legát, M., Matoušek, J.: Identifying Concatenation Discontinuities by Hierarchical Divisive Clustering of Pitch Contours. In: Habernal, I., Matoušek, V. (eds.) TSD 2011. LNCS, vol. 6836, pp. 171–178. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  14. 14.
    Legát, M., Matoušek, J.: Pitch contours as predictors of audible concatenation artifacts. In: Proceedings of the World Congress on Engineering and Computer Science, San Francisco, USA, pp. 525–529 (2011)Google Scholar
  15. 15.
    Legát, M., Matoušek, J.: Collection and Analysis of Data for Evaluation of Concatenation Cost Functions. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2010. LNCS, vol. 6231, pp. 345–352. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  16. 16.
    Chistovich, L.A.: Central auditory processing of peripheral vowel spectra. J. Acoust. Soc. Am. 77, 789–805 (1985)CrossRefGoogle Scholar
  17. 17.
    Legát, M., Matoušek, J., Tihelka, D.: On the detection of pitch marks using a robust multi-phase algorithm. Speech Communication 53, 552–566 (2011)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Milan Legát
    • 1
  • Radek Skarnitzl
    • 2
  1. 1.Faculty of Applied Sciences, Department of CyberneticsUniversity of West Bohemia in PilsenCzech Republic
  2. 2.Faculty of Arts, Institute of PhoneticsCharles University in PragueCzech Republic

Personalised recommendations