Special Speech Synthesis for Social Network Websites

  • Csaba Zainkó
  • Tamás Gábor Csapó
  • Géza Németh
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6231)

Abstract

This paper gives an overview of the design concepts and implementation of a Hungarian microblog reading system. Speech synthesis of such special text requires some special components. First, an efficient diacritic reconstruction algorithm was applied. The accuracy of a former dictionary-based method was improved by machine learning to handle ambiguous cases properly. Second, an unlimited domain text-to-speech synthesizer was applied with extensions for emotional and spontaneous styles. Chat or blog texts often contain ”emoticons” which mark the emotional state of the user. Therefore, an expressive speech synthesis method was adapted to a corpus-based synthesizer. Four emotions were generated and evaluated in a listening test: neutral, happy, angry and sad. The results of the experiments showed that happy and sad emotions can be generated with this algorithm, with best accuracy for female voice.

Keywords

diacritic restoration emotional speech synthesis microblog reading system chat-to-speech 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    X-Chat Text-To-Speech, https://launchpad.net/xctts
  2. 2.
    Mihalcea, R., Nastase, V.: Letter Level Learning for Language Independent Diacrtitics Restoration. In: COLING 2002, Taipei, Taiwan, pp. 1–7 (2002)Google Scholar
  3. 3.
    Galicia-Haro, S.N., Bolshakov, I.A., Gelbukh, A.F.: A Simple Spanish Part of Speech Tagger for Detection and Correction of Accentuation Error. In: Matoušek, V., Mautner, P., Ocelíková, J., Sojka, P. (eds.) TSD 1999. LNCS (LNAI), vol. 1692, pp. 219–222. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  4. 4.
    Simard, M.: Automatic Insertion of Accents in French Text. In: Proc. of Conf. EMNLP, Granada, Spain, pp. 27–35 (1998)Google Scholar
  5. 5.
    Németh, G., Zainkó, C., Fekete, L., Olaszy, G., Endrédi, G., Olaszi, P., Kiss, G., Kiss, P.: The Design, Implementation and Operation of a Hungarian E-Mail Reader. Int. Journ. Of Speech Techn. 3-4, 216–228 (2000)Google Scholar
  6. 6.
    Hungarian National Corpus, http://corpus.nytud.hu/mnsz
  7. 7.
    Witten, I.H., Frank, E.: Using the J4.8 Decision Tree, Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, San Francisco (2005)Google Scholar
  8. 8.
    Carlson, R., Gustafson, K., Strangert, E.: Modelling Hesitation for Synthesis of Spontaneous Speech. In: Proc. of Speech Prosody, Dresden, pp. 69–72 (2006)Google Scholar
  9. 9.
    Fék, M., Pesti, P., Németh, G., Zainkó, Cs., Olaszy, G.: Corpus-Based Unit Selection TTS for Hungarian. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2006. LNCS (LNAI), vol. 4188, pp. 367–374. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  10. 10.
    Csapó, T.G., Zainkó, Cs., Németh, G.: A Study of Prosodic Variability Methods in a Corpus-Based Unit Selection Text-To-Speech System. Infocommunications Journal LXV, 32–37 (2010)Google Scholar
  11. 11.
    Bulut, M., Narayanan, S. S., Syrdal, A.K.: Expressive Speech Synthesis Using a Concatenative Synthesizer. In: Proc. ICSLP, pp. 1265–1268 (2002)Google Scholar
  12. 12.
    Přibilová, A., Přibil, J.: Spectrum Modification for Emotional Speech Synthesis. Multimodal Signals: Cognitive and Algorithmic Issues, 232–241 (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Csaba Zainkó
    • 1
  • Tamás Gábor Csapó
    • 1
  • Géza Németh
    • 1
  1. 1.Department of Telecommunications and Media InformaticsBudapest University of Technology and EconomicsHungary

Personalised recommendations