Corpus-Based Unit Selection TTS for Hungarian

  • Márk Fék
  • Péter Pesti
  • Géza Németh
  • Csaba Zainkó
  • Gábor Olaszy
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4188)

Abstract

This paper gives an overview of the design and development of an experimental restricted domain corpus-based unit selection text-to-speech (TTS) system for Hungarian. The experimental system generates weather forecasts in Hungarian. 5260 sentences were recorded creating a speech corpus containing 11 hours of continuous speech. A Hungarian speech recognizer was applied to label speech sound boundaries. Word boundaries were also marked automatically. The unit selection follows a top-down hierarchical scheme using words and speech sounds as units. A simple prosody model is used, based on the relative position of words within a prosodic phrase. The quality of the system was compared to two earlier Hungarian TTS systems. A subjective listening test was performed by 221 listeners. The experimental system scored 3.92 on a five-point mean opinion score (MOS) scale. The earlier unit concatenation TTS system scored 2.63, the formant synthesizer scored 1.24, and natural speech scored 4.86.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Möbius, B.: Corpus-Based Speech Synthesis: Methods and Challenges. AIMS 6(4), 87–116 (2000)Google Scholar
  2. 2.
    Olaszy, G., Németh, G., Olaszi, P., Kiss, G., Gordos, G.: PROFIVOX - A Hungarian Professional TTS System for Telecommunications Applications. International Journal of Speech Technology 3(3/4), 201–216 (2000)MATHCrossRefGoogle Scholar
  3. 3.
    Németh, G., Zainkó, C.: Word Unit Based Multilingual Comparative Analysis of Text Corpora. In: Eurospeech 2001, pp. 2035–2038 (2001)Google Scholar
  4. 4.
    Boersma, P.: Accurate Short-Term Analysis of the Fundamental Frequency and the Harmonics-to-Noise Ratio of a Sampled Sound. In: IFA Proceedings, vol. 17, pp. 97–110 (1993)Google Scholar
  5. 5.
    Mihajlik, P., Révész, T., Tatai, P.: Phonetic Transcription in Automatic Speech Recognition. Acta Linguistica Hungarica 49(3–4), 407–425 (2002)CrossRefGoogle Scholar
  6. 6.
    Vicsi, K., Tóth, L., Kocsor, A., Gordos, G., Csirik, J.: MTBA - Magyar nyelvű telefonbeszéd adatbázis (Hungarian Telephone-Speech Database). In: Híradástechnika, vol. 2002/8, pp. 35–39 (2002)Google Scholar
  7. 7.
    Taylor, P., Black, A., W.: Speech Synthesis by Phonological Structure Matching. In: Eurospeech 1999, vol. 2, pp. 623–626 (1999)Google Scholar
  8. 8.
    Olaszy, G.: Az artikuláció akusztikus vetülete – a hangsebészet elmélete és gyakorlata (The Articulation and the Spectral Content—the Theory and Practice of Sound Surgery). In: Hunyadi, L. (ed.) KIF-LAF (Journal of Experimental Phonetics and Laboratory Phonology), Debreceni Egyetem, pp. 241–254 (2003)Google Scholar
  9. 9.
    Olaszy, G., Gordos, G., Németh, G.: The MULTIVOX Multilingual Text-to-Speech Converter. In: Bailly, G., Benoit, C., Sawallis, T. (eds.) Talking machines: Theories, Models and Applications, pp. 385–411. Elsevier, Amsterdam (1992)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Márk Fék
    • 1
  • Péter Pesti
    • 1
  • Géza Németh
    • 1
  • Csaba Zainkó
    • 1
  • Gábor Olaszy
    • 1
  1. 1.Laboratory of Speech Technology, Department of Telecommunications and Media InformaticsBudapest University of Technology and EconomicsHungary

Personalised recommendations