Abstract
The model of prosody used in the Aculab TTS system is unusual in several respects. Firstly, it is based firmly on current metrical theories of prosody. Secondly, it is entirely knowledge-based: there are no stochastic components in the model. Thirdly, it makes use of a quasi-random element to avoid the predictability of conventional synthetic prosody. Fourthly, it is specifically designed for multilingual use: it currently handles several Germanic and Romance languages.
Similar content being viewed by others
References
Arvaniti, A. and Ladd, D.R. (1995). Tonal alignment and the representation of accentual targets. Proceedings of ICPhS. Stockholm, Sweden, vol. 4, pp. 220-223.
Avesani, C. (1990). A contribution to the synthesis of Italian intonation. Proceedings of ICSLP. Kobe, Japan, vol. 1, pp. 834-836.
Avesani, C. (1995). ToBIt. Un sistema di trascrizione per l'intonazione italiana. Atti delle V Giornate di Studio del Gruppo di Fonetica Sperimentale. Trento, Italy, November 1994, pp. 85-98.
Burnett, D.C., Walker, M.R., and Hunt, A. (2002). Speech Synthesis Markup Language Specification. Retrieved 8 April, 2002, from http://www.w3.org/TR/speech-synthesis/.
Campbell, W.N., Isard, S.D., Monaghan, A., and Verhoeven, J. (1990). Duration, pitch and diphones in the CSTR TTS system. Proceedings of ICSLP. Kobe, Japan, vol. 2, pp. 825-828.
Crystal, D. (1969). Prosodic Systems and Intonation in English. Cambridge, England: Cambridge University Press.
Di Cristo, A. (1998). Intonation in French. In D. Hirst and A. Di Cristo (Eds.), Intonation Systems. Cambridge, England: Cambridge University Press, pp. 195-218.
Di Cristo, A. (1999). Le Cadre Accentuel du Français: Essai de Modélisation. Langues, 2:184-205 and 258-269.
Di Cristo, A., Di Cristo, P., Campione, E., and Véronis, J. (2000). A prosodic model for text-to-speech synthesis in French. In A. Botinis (Ed.), Intonation: Analysis, Modelling and Technology. The Netherlands: Kluwer, Amsterdam, pp. 321-355.
Frota, S. (1998). Prosody and Focus in European Portuguese. Doctoral dissertation, Universidade de Lisboa, Portugal.
Garrido, J.M. (1996). Modelling Spanish Intonation for Text-to-Speech Applications. Ph.D. Thesis, Universidad Aut`onoma de Barcelona, Spain.
Gee, J.P. and Grosjean, M. (1983). Performance structures: A psycholinguistic and linguistic appraisal. Cognitive Psychology, 15:411-458.
Gua¨tella, I. (1991). Rhythme et Parole. Doctoral dissertation, Universit é de Provence, France.
Ladd, D.R. (1987). A phonological model of intonation for use in speech synthesis by rule. Proceedings of the European Conference on Speech Technology. Edinburgh, Scotland, vol.2, pp. 21-24.
Liberman, M. and Prince, A. (1977). On stress and linguistic rhythm. Linguistic Inquiry, 8:249-336.
Monaghan, A.I.C. (1990). Rhythm and stress shift in speech synthesis. Computer Speech and Language, 4:71-78.
Monaghan, A.I.C. (1991). Intonation in a Text-to-Speech Conversion System. Ph.D. Thesis, University of Edinburgh, Scotland.
Monaghan, A.I.C. (1992). Heuristic strategies for higher-level analysis of unrestricted text. In G. Bailly and C. Benoit (Eds.), Talking Machines, Amsterdam, The Netherlands: Elsevier, pp. 143-161.
Monaghan, A.I.C. (1993). What determines accentuation? Journal of Pragmatics, 19:559-584.
Monaghan, A., Kassaei, M., Luckin, M., Amador-Hernandez, M., Lowry, A., Faulkner, D., and Sannier, F. (2001). Multilingual TTS for computer telephony: The Aculab approach. Proceedings of Eurospeech. Aalborg, Denmark, vol. 1, pp. 513-516.
Prieto, P. (1997). Register shift in Spanish downstepping contours. Proceedings of theESCAWorkshop on Intonation. Athens, Greece, pp. 275-278.
Prieto, P. and Shih, C. (1995). Effects of tonal clash on downstepped H* accents in Spanish. Proceedings of Eurospeech. Madrid, Spain, vol. 2, pp. 1307-1310.
Prieto, P., Shih, C., and Nibert, H. (1996). Pitch downtrend in Spanish. Journal of Phonetics, 24:445-473.
Prince, A. (1983). Relating to the grid. Linguistic Inquiry, 14:19-100.
Santi, S. (1992). Synthèse Vocale de Sons du Français. Doctoral dissertation, Université de Provence, France.
Teixeira, J.P., Freitas, D., Braga, D., Barros, M.J., and Latsch, V. (2001). Phonetic events from labeling the European Portuguese database for speech synthesis. Proceedings of Eurospeech. Aalborg, Denmark, vol. 3, pp. 1707-1711.
Vazquez-Alvarez, Y. (2001). Text-to-Speech (TTS) Synthesis Evaluation. MSc dissertation, University College London, England.
Vazquez-Alvarez, Y. and Huckvale, M. (2002). The reliability of the ITU-T P.85 standard for the evaluation of text-to-speech systems. Proceedings of ICSLP. Denver, USA, vol. 1, pp. 329-332.
Zellner, B. (1998). Caractérisation et Prédiction du Débit de Parole en Français. Doctoral dissertation, Université de Lausanne, Switzerland.
Zellner Keller, B. and Keller, E. (2001). Representing speech rhythm. In E. Keller, G. Bailly, A. Monaghan, J. Terken, and M. Huckvale (Eds.), Improvements in Speech Synthesis. Chichester, England: John Wiley, pp. 154-164.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Monaghan, A. A Metrical Model of Prosody for Multilingual TTS. International Journal of Speech Technology 6, 73–81 (2003). https://doi.org/10.1023/A:1021056124145
Issue Date:
DOI: https://doi.org/10.1023/A:1021056124145