Advertisement

International Journal of Speech Technology

, Volume 3, Issue 3–4, pp 201–215 | Cite as

Profivox—A Hungarian Text-to-Speech System for Telecommunications Applications

  • G. Olaszy
  • G. Németh
  • P. Olaszi
  • G. Kiss
  • Cs. Zainkó
  • G. Gordos
Article

Abstract

The latest Hungarian text-to-speech (TTS) system developed for telephone-based applications is described. The main features are intelligible human-like voice; robust software designed for continuous running; fully automatic conversion of declarative (short and very long) sentences and questions; and real time parallel operation, running on minimum 30 channels. The concept of prosody generation and sound duration processing is introduced. Also, the development environment of Profivox is presented. The market-leader Hungarian mobile service provider applies the TTS system in an automatic e-mail reading application.

Hungarian TTS robust fast software human-like voice specific sound durations 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Adriaens, H. (1991). Ein modell deutscher Intonation. University of Leiden, Ph.D. Thesis.Google Scholar
  2. Allen, J., Hunnicut, S., and Klatt, D.H. (1987). From Text to Speech: the MITalk System. Cambridge, U.K., Cambridge University, Press.Google Scholar
  3. Ferenczi, T., Németh, G., Olaszy, G., and Gáspár, Z. (1997). A flexible client-server model for multilingual CTS/TTS development. In Proceedings of Eurospeech ’97, Rhodes, Greece, pp. 693–696.Google Scholar
  4. Hallahan, W.I. (1995). DECtalk software: text-to-speech technology and implementation. Digital Technical Journal, 7:5–19.Google Scholar
  5. Kiss, G. and Olaszy, G. (1984). A HUNGAROVOX magyar nyelvű szótár nélküli valósidejű párbeszédes beszédszintetizáló rendszer. (Hungarovox, a Hungarian real time TTS synthesizer.) Információ Elektronika, 2:98–112.Google Scholar
  6. Koutny, I. (1999). Parsing Hungarian sentences in order to determine their Prosodic structure in a multilingual TTS system. In Proceedings of Eurospeech ’99, pp. 2091–2094.Google Scholar
  7. Koutny, I. and Olaszy, G. (2000). Stress, focus and tempo in Hungarian sentences for TTS conversion, W. Jassem (Ed.), Speech and language technology, Poznan, Poland, pp. 57–70.Google Scholar
  8. Németh, G., Zainkó, Cs., Olaszy, G., and Prószéky, G. (1999). Problems of creating a flexible e-mail reader for Hungarian. In Proceedings of Eurospeech ’99, pp. 939–942.Google Scholar
  9. Olaszy, G. (1982). Some rules for the formant synthesis of Hungarian. In Proceedings of the 8th Acoustic Colloquium, Budapest, pp. 204–210.Google Scholar
  10. Olaszy, G. (1989). MULTIVOX—A flexible text-to-speech system for Hungarian, Finnish, German, Esperanto, Italian and other languages for IBM PC. In Proceedings of the European Conference on Speech Communication and Technology, pp. 525–529.Google Scholar
  11. Olaszy, G., Gordos, and G., Németh, G. (1992). The Multivox multilingual text-to-speech converter. In G. Bailly, C. Benoit, and T.R. Sawallis (Eds.), Talking Machines: Theories, Models, and Designs, Amsterdam, Elsevier, pp. 385–411.Google Scholar
  12. Olaszy, G. (1994). Hangidőtartam-módosító kisérletek a gépi beszéd ritmusának javítására. (Experiment on sound duration changes to prove the rhythm of synthesized speech.) In M. Gósy (Ed.), Beszédkutatás 1994, pp. 140–151. ssssGoogle Scholar
  13. Olaszy, G. and Németh, G. (1997). Prosody generation for German concept-to-speech systems. (From theoretical intonation patterns to practical realisation.) Speech Communication 21, pp. 37–60.Google Scholar
  14. Olaszy, G. and Olaszi, P. (1998). Hangidőtartamok mesterséges változtatása periódusok kivágásával, megismétlésével. (Changing the sound duration by inserting and deleting pitch periods.) In Beszédkutatás’98M. Gósy (Ed.), MTA Nyelvtudományi Intézet, Budapest, pp. 151–162.Google Scholar
  15. Olaszy, G., Németh, G., Olaszi, P., and Gordos, G. (1999). Interactive TTS supported speech message composer for large, limited but open information systems. In Proceedings of Eurospeech ’99, pp. 943–946.Google Scholar
  16. Olaszy, G. (2000). A magyar beszéd-hangok specifikus időtartamainak meghatàrozàsa folyamatos beszèdre. (The definition of the specific sound durations of Hungarian for continuous speech). In Beszédkutatás ’2000M. Gósy (Ed.), MTA Nyelvtudományi Intézet, Budapest, Hungary, pp. 93–109.Google Scholar
  17. Prószéky, G. and Tihanyi, L. (1993). Humor: High-speed unification morphology and its applications for agglutinative languages. La tribune des industries de la langue, No. 10, OFIL, Paris, pp. 28–29.Google Scholar
  18. van Santen, J.P.H., Shih, C., and Möbius, B. (1998). Intonation. In R. Sproat (Ed.), Multilingual text-to-speech synthesis: The Bell Labs Approach, New York, Kluwer Academic Publishers, pp. 142–189.Google Scholar
  19. van Santen, J.P.H. (1998). Timing. In R. Sproat (Ed.), Multilingual text-to-speech synthesis: The Bell Labs Approach, New York, Kluwer Academic Publishers, pp. 115–139.Google Scholar
  20. Venditti, J.J. and van Santen, J.P.H. (1998). Modelling vowel duration for Japanese text-to-speech synthesis. In Proceedings of the 5th International Conference on Spoken Language Processing, Sydney, pp. 2043–2046.Google Scholar
  21. Zellner, B. (1994). Pauses and the temporal structure of speech. In E. Keller (Ed.), Fundamentals of Speech Synthesis and Speech Recognition, New York, John Wiley & Sons, pp. 42–62.Google Scholar

Copyright information

© Kluwer Academic Publishers 2000

Authors and Affiliations

  • G. Olaszy
    • 1
  • G. Németh
    • 1
  • P. Olaszi
    • 1
  • G. Kiss
    • 1
  • Cs. Zainkó
    • 1
  • G. Gordos
    • 1
  1. 1.Department of Telecommunications and TelematicsBudapest University of Technology and EconomicsBudapestHungary

Personalised recommendations