International Journal of Speech Technology

, Volume 3, Issue 3–4, pp 187–200 | Cite as

Prosody Prediction from Text in Hungarian and its Realization in TTS Conversion

  • Ilona Koutny
  • Gábor Olaszy
  • Péter Olaszi
Article

Abstract

Proper prosodic structure is crucial for natural-sounding synthesized speech. Because of the lack of other information on discourse structure, we have to rely on syntactic structure in order to predict the main prosodic items for normal speech. To meet this requirement, a dependency-based parser has been developed for Hungarian that assigns the boundaries of functional constituents in the sentence, in other words, the places where new intonation patterns start and breaks can be inserted. We determine stress distribution in the sentence, using four levels including focus. The practical realization of the prosodic predictor also relies on statistical and empirical data. The intonation units (tone groups) with proper melody (e.g., falling, slowly falling, level, rising, slowly rising, rising-falling, and falling-rising) are established on the base of syntactic properties in declarative, interrogative, and imperative sentences. The results are embedded in an experimental Hungarian text-to-speech (TTS) system.

prosody prediction parsing tone groups superposition of pitch patterns TTS synthesis 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Dirksen, A. and Quené, H. (1993). Prosodic analysis: The next generation. In V.J. van Heuven and L.C.W. Pols (Eds.), Analysis and Synthesis of Speech. Strategic Research Towards High Quality Text-to-Speech Generation, Berlin, Mouton de Gruyter, pp. 131–144.Google Scholar
  2. É. Kiss (1992). Az egyszerú mondat szerkezete [Structure of the simple Hungarian sentence]. In F. Kiefer (Ed.), Strukturális magyar nyelvtan. 1. kötet: Mondattan. Bp: Akadémia Kiadó, pp. 79–177.Google Scholar
  3. Fónagy, I. (1998). Intonation in Hungarian. In D. Hirst and A. Di Cristo (Eds.), Intonation Systems, Cambridge, CUP, pp. 328–344.Google Scholar
  4. Fujisaki, H. and Ohno, S. (1997). Comparison and assessment of models in the study of fundamental frequency contours in speech. In Eurospeech ’97, Athens, pp. 131–134.Google Scholar
  5. Gósy, M. (1992). Speech perception. Frankfurt am Main, Hector.Google Scholar
  6. Gósy, M. and Terken, J. (1994). Question marking in Hungarian: Timing and height of pitch peaks. Journal of Phonetics 22: 269–281.Google Scholar
  7. Hajicová, E., Skoumalová, H., and Sgall, P. (1995). An automatic procedure for topic-focus identification. Computational Linguistics, March: 81–94.Google Scholar
  8. Hellwig, P. (1989). Parsing natürlicher Sprachen. In I. Bátori, W. Lenders, and W. Putschke (Eds.), Computational Linguistics, Berlin, Walter de Gruyter, pp. 348–377.Google Scholar
  9. Jassem, W. and Demenko, G. (1997). Fonetyczno-gramatyczna spójność frazy. In W. Jassem, Cz. and Bastura (Eds.), Speech and Language Technology, Vol. 1, Poznan, pp. 125–140.Google Scholar
  10. Kálmán L. and Nádasdy, Á. (1994). A hangsúly ‘Stress’. In F. Kiefer (Ed.), Strukturális magyar nyelvtan. 2. kötet: Fonológia. Bp: Akadémia, pp. 393–467.Google Scholar
  11. Koutny, I. (1998a). Handling some Hungarian structures in dependency framework for natural language processing. Lingua Posnaniensis, 40: 89–101.Google Scholar
  12. Koutny, I. (1998b). Kísérlet magyar nyelvú megnyilatkozások prozódiai jellemzóinek automatikus meghatározására [Attempt to automatically determine some prosodic features of Hungarian utterances.] In M. Gósy (Ed.), Beszédkutatás 1998, Budapest, MTA Nyelvtudományi Intézet, pp. 223–235.Google Scholar
  13. Koutny, I. (1999). Parsing Hungarian sentences in order to determine their prosodic structures in a multilingual TTS system. In Proceedings of Eurospeech ’99. Budapest, pp. 2091–2094.Google Scholar
  14. Koutny, I. and Olaszy, G. (2000). Stress, Focus and Tempo in Hungarian Sentences for TTS Conversion. In W. Jassem, Cz. Basztura, and G. Demenko (Eds.), Speech and Language Technology, Vol. 4, Part 1, Poznań, pp. 57–70.Google Scholar
  15. Monaghan, A.I.C. (1993). Parsing unrestricted text: a multiphase approach. In Eurospeech ’93, Berlin, pp. 1817–1820.Google Scholar
  16. Montero, J.M., Gutiérrez-Arriola, J., Colás, J., Macias, J., Enriquez, E., and Pardo, J.M. (1999). Development of an emotional speech synthesizer in Spanish. In Proceedings of the 6th European Conference on Speech Communication and Technology, pp. 2099–2102.Google Scholar
  17. Möbius, B. (1997). Synthesizing German intonation contours. In J.P.H. van Santen et al. (Eds.), Progress in Speech Synthesis, Springer, pp. 401–415.Google Scholar
  18. Nakatani, C.H. (1999). Prominence variation beyond given/new. In Eurospeech ’99, Budapest, pp. 547–550.Google Scholar
  19. Olaszi, P. (1998). Syntactic analysis of Hungarian sentences to predict prosodic information for speech synthesis. In Proceedings of the Workshop on Circuit Theory, System Information and Applications, Krakow, pp. 49–54.Google Scholar
  20. Olaszy, G. (1989). Gépi beszédelóállítás [Automatic speech generation]. Budapest.Google Scholar
  21. Olaszy, G. (1996a). Szabályrendszer prozódiai elemek gépi megvalósításához [Rule system for the automatic realization of prosodic elements.] In M. Gósy (Ed.), Beszédkutatás, MTA Nyelvtudományi Intézet, Budapest, pp. 97–109.Google Scholar
  22. Olaszy, G. (1996b). Számelemek kiejtésének fonetikai vizsgálata. [Phonetic investigation of the pronunciation of numbers]. In M. Gósy (Ed.), Beszédkutatás, MTA Nyelvtudományi Intézet, Budapest, pp. 97–109.Google Scholar
  23. Olaszy, G., Gordos, G., and Németh, G. (1992). The MULTIVOX multilingual text-to-speech converter. In G. Bailly and C. Benoit (Eds.), Talking Machines: Theories, Models, and Designs. Amsterdam, North-Holland. pp. 385–411.Google Scholar
  24. Olaszy, G. and Németh, G. (1997). Prosody generation for German CTS/TTS systems (from theoretical intonation patterns to practical realisation). Speech Communication, 21:37–60.Google Scholar
  25. Pierrehumbert, J. (1980). The phonology and phonetics of English Intonation. Ph.D. Thesis, MIT.Google Scholar
  26. Prószéky, G. (1989). Számítógépes nyelvészet[Computational linguistics]. Budapest, SzámAlk.Google Scholar
  27. Prószéky, G. (1996). Humor. High-speed unification morphology. In Proceedings of TELRI "Language Resources for Language Technology", pp. 149–158.Google Scholar
  28. Rank, E. and Pirker, H. (1998). Generating emotional speech with a concatenative synthesizer. In Proceedings of ICSLP, Sydney, pp. 947–950.Google Scholar
  29. 't Hart, J., Collier, R., and Cohen, A. (1990). A Perceptual Study of Intonation. Cambridge: Cambridge University Press.Google Scholar
  30. Varga, L. (1994). A hanglejtés [Intonation]. In F. Kiefer (Ed.), Strukturális magyar nyelvtan. 2. kötet: Fonológia. Bp: Akadémia, pp. 468–549.Google Scholar

Copyright information

© Kluwer Academic Publishers 2000

Authors and Affiliations

  • Ilona Koutny
    • 1
  • Gábor Olaszy
    • 2
  • Péter Olaszi
    • 3
  1. 1.Adam Mickiewicz University, Institute of LinguisticsPoznańPoland
  2. 2.Kempelen Farkas Speech Research Laboratory, Research Institute for LinguisticsHungarian Academy ofSciencesBudapest, Pf. 19Hungary
  3. 3.Department of Telecommunications and TelematicsBudapest University of Technology and EconomicsBudapestPázmány Péter sétány 1/D.

Personalised recommendations