Advertisement

Automatic Segmentation of Parasitic Sounds in Speech Corpora for TTS Synthesis

  • Jindřich Matoušek
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6231)

Abstract

In this paper, automatic segmentation of parasitic speech sounds in speech corpora for text-to-speech (TTS) synthesis is presented. The automatic segmentation is, beside the automatic detection of the presence of such sounds in speech corpora, an important step in the precise localisation of parasitic sounds in speech corpora. The main goal of this study is to find out whether the segmentation of these sounds is accurate enough to enable cutting the sounds out of synthetic speech or explicit modelling of these sounds during synthesis. HMM-based classifier was employed to detect the parasitic sounds and to find the boundaries between these sounds and the surrounding phones simultaneously. The results show that the automatic segmentation of parasitic sounds is comparable to the segmentation of other phones, which indicates that the cutting out or the explicit usage of parasitic sounds should be possible.

Keywords

parasitic speech sound speech synthesis unit selection HMM automatic phonetic segmentation 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Tihelka, D., Romportl, J.: Exploring Automatic Similarity Measures for Unit Selection Tuning. In: Proceedings of Interspeech, Brighton, Great Britain, pp. 736–739 (2009)Google Scholar
  2. 2.
    Matoušek, J., Skarnitzl, R., Machač, P., Trmal, J.: Identification and Automatic Detection of Parasitic Speech Sounds. In: Proceedings of Interspeech, Brighton, Great Beritain, pp. 876–879 (2009)Google Scholar
  3. 3.
    Matoušek, J., Tihelka, D., Romportl, J.: Current State of Czech Text-to-Speech System ARTIC. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2006. LNCS (LNAI), vol. 4188, pp. 439–446. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  4. 4.
    Skarnitzl, R.: Acoustic Categories of Nonmodal Phonation in the Context of the Czech Conjunction “a”. In: Palková, Z., Veroňková, J. (eds.) AUC Philologica 1/2004, Phonetica Pragensia X, Karolinum, Prague (2008)Google Scholar
  5. 5.
    Machač, P., Skarnitzl, R.: Phonetic Analysis of Parasitic Speech Sounds. In: Proceedings of the 19th Czech-German Workshop on Speech Processing, Prague, Czech Rep., pp. 61–68 (2009)Google Scholar
  6. 6.
    Byrne, W., Doerman, D., Franz, M., Gustman, S., Hajič, J., Oard, D., Picheny, M., Psutka, J., Ramabhadran, B., Soergel, D., Ward, T., Zhu, W.: Automatic Recognition of Spontaneous Speech for Access to Multilingual Oral History Archives. IEEE Transactions on Speech and Audio Processing 4, 420–435 (2004)CrossRefGoogle Scholar
  7. 7.
    Toledano, D., Gómez, L., Grande, L.: Automatic Phonetic Segmentation. IEEE Transactions on Speech and Audio Processing 11(6), 617–625 (2003)CrossRefGoogle Scholar
  8. 8.
    Vaněk, J., Psutka, J.V., Zelinka, J., Pražák, A., Psutka, J.: Discriminative Training of Gender-Dependent Acoustic Models. In: Matoušek, V., Mautner, P. (eds.) TSD 2009. LNCS (LNAI), vol. 5729, pp. 331–338. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  9. 9.
    Matoušek, J.: Automatic Pitch-Synchronous Phonetic Segmentation with Context-Independent HMMs. In: Matoušek, V., Mautner, P. (eds.) TSD 2009. LNCS (LNAI), vol. 5729, pp. 178–185. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  10. 10.
    Schwarz, P., Matějka, P., Černocký, J.: Towards Lower Error Rates In Phoneme Recognition. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2004. LNCS (LNAI), vol. 3206, pp. 465–472. Springer, Heidelberg (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Jindřich Matoušek
    • 1
  1. 1.Faculty of Applied Sciences, Dept. of CyberneticsUniversity of West BohemiaPlzeňCzech Republic

Personalised recommendations