A Metrical Model of Prosody for Multilingual TTS

Monaghan, A.I.C.

doi:10.1023/A:1021056124145

A Metrical Model of Prosody for Multilingual TTS

Published: January 2003

Volume 6, pages 73–81, (2003)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

A.I.C. Monaghan¹

63 Accesses
3 Citations
Explore all metrics

Abstract

The model of prosody used in the Aculab TTS system is unusual in several respects. Firstly, it is based firmly on current metrical theories of prosody. Secondly, it is entirely knowledge-based: there are no stochastic components in the model. Thirdly, it makes use of a quasi-random element to avoid the predictability of conventional synthetic prosody. Fourthly, it is specifically designed for multilingual use: it currently handles several Germanic and Romance languages.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Natural language processing: state of the art, current trends and challenges

Article 14 July 2022

Natural language syntax complies with the free-energy principle

Article Open access 03 May 2024

Re-examining selective adaptation: Fatiguing feature detectors, or distributional learning?

Article 05 October 2015

References

Arvaniti, A. and Ladd, D.R. (1995). Tonal alignment and the representation of accentual targets. Proceedings of ICPhS. Stockholm, Sweden, vol. 4, pp. 220-223.
Google Scholar
Avesani, C. (1990). A contribution to the synthesis of Italian intonation. Proceedings of ICSLP. Kobe, Japan, vol. 1, pp. 834-836.
Google Scholar
Avesani, C. (1995). ToBIt. Un sistema di trascrizione per l'intonazione italiana. Atti delle V Giornate di Studio del Gruppo di Fonetica Sperimentale. Trento, Italy, November 1994, pp. 85-98.
Burnett, D.C., Walker, M.R., and Hunt, A. (2002). Speech Synthesis Markup Language Specification. Retrieved 8 April, 2002, from http://www.w3.org/TR/speech-synthesis/.
Campbell, W.N., Isard, S.D., Monaghan, A., and Verhoeven, J. (1990). Duration, pitch and diphones in the CSTR TTS system. Proceedings of ICSLP. Kobe, Japan, vol. 2, pp. 825-828.
Google Scholar
Crystal, D. (1969). Prosodic Systems and Intonation in English. Cambridge, England: Cambridge University Press.
Google Scholar
Di Cristo, A. (1998). Intonation in French. In D. Hirst and A. Di Cristo (Eds.), Intonation Systems. Cambridge, England: Cambridge University Press, pp. 195-218.
Google Scholar
Di Cristo, A. (1999). Le Cadre Accentuel du Français: Essai de Modélisation. Langues, 2:184-205 and 258-269.
Google Scholar
Di Cristo, A., Di Cristo, P., Campione, E., and Véronis, J. (2000). A prosodic model for text-to-speech synthesis in French. In A. Botinis (Ed.), Intonation: Analysis, Modelling and Technology. The Netherlands: Kluwer, Amsterdam, pp. 321-355.
Google Scholar
Frota, S. (1998). Prosody and Focus in European Portuguese. Doctoral dissertation, Universidade de Lisboa, Portugal.
Garrido, J.M. (1996). Modelling Spanish Intonation for Text-to-Speech Applications. Ph.D. Thesis, Universidad Aut`onoma de Barcelona, Spain.
Gee, J.P. and Grosjean, M. (1983). Performance structures: A psycholinguistic and linguistic appraisal. Cognitive Psychology, 15:411-458.
Google Scholar
Gua¨tella, I. (1991). Rhythme et Parole. Doctoral dissertation, Universit é de Provence, France.
Ladd, D.R. (1987). A phonological model of intonation for use in speech synthesis by rule. Proceedings of the European Conference on Speech Technology. Edinburgh, Scotland, vol.2, pp. 21-24.
Google Scholar
Liberman, M. and Prince, A. (1977). On stress and linguistic rhythm. Linguistic Inquiry, 8:249-336.
Google Scholar
Monaghan, A.I.C. (1990). Rhythm and stress shift in speech synthesis. Computer Speech and Language, 4:71-78.
Google Scholar
Monaghan, A.I.C. (1991). Intonation in a Text-to-Speech Conversion System. Ph.D. Thesis, University of Edinburgh, Scotland.
Google Scholar
Monaghan, A.I.C. (1992). Heuristic strategies for higher-level analysis of unrestricted text. In G. Bailly and C. Benoit (Eds.), Talking Machines, Amsterdam, The Netherlands: Elsevier, pp. 143-161.
Google Scholar
Monaghan, A.I.C. (1993). What determines accentuation? Journal of Pragmatics, 19:559-584.
Google Scholar
Monaghan, A., Kassaei, M., Luckin, M., Amador-Hernandez, M., Lowry, A., Faulkner, D., and Sannier, F. (2001). Multilingual TTS for computer telephony: The Aculab approach. Proceedings of Eurospeech. Aalborg, Denmark, vol. 1, pp. 513-516.
Google Scholar
Prieto, P. (1997). Register shift in Spanish downstepping contours. Proceedings of theESCAWorkshop on Intonation. Athens, Greece, pp. 275-278.
Prieto, P. and Shih, C. (1995). Effects of tonal clash on downstepped H* accents in Spanish. Proceedings of Eurospeech. Madrid, Spain, vol. 2, pp. 1307-1310.
Google Scholar
Prieto, P., Shih, C., and Nibert, H. (1996). Pitch downtrend in Spanish. Journal of Phonetics, 24:445-473.
Google Scholar
Prince, A. (1983). Relating to the grid. Linguistic Inquiry, 14:19-100.
Google Scholar
Santi, S. (1992). Synthèse Vocale de Sons du Français. Doctoral dissertation, Université de Provence, France.
Teixeira, J.P., Freitas, D., Braga, D., Barros, M.J., and Latsch, V. (2001). Phonetic events from labeling the European Portuguese database for speech synthesis. Proceedings of Eurospeech. Aalborg, Denmark, vol. 3, pp. 1707-1711.
Google Scholar
Vazquez-Alvarez, Y. (2001). Text-to-Speech (TTS) Synthesis Evaluation. MSc dissertation, University College London, England.
Vazquez-Alvarez, Y. and Huckvale, M. (2002). The reliability of the ITU-T P.85 standard for the evaluation of text-to-speech systems. Proceedings of ICSLP. Denver, USA, vol. 1, pp. 329-332.
Google Scholar
Zellner, B. (1998). Caractérisation et Prédiction du Débit de Parole en Français. Doctoral dissertation, Université de Lausanne, Switzerland.
Zellner Keller, B. and Keller, E. (2001). Representing speech rhythm. In E. Keller, G. Bailly, A. Monaghan, J. Terken, and M. Huckvale (Eds.), Improvements in Speech Synthesis. Chichester, England: John Wiley, pp. 154-164.
Google Scholar

Download references

Author information

Authors and Affiliations

Aculab, Milton Keynes, MK1 1PT, UK
A.I.C. Monaghan

Authors

A.I.C. Monaghan
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Monaghan, A. A Metrical Model of Prosody for Multilingual TTS. International Journal of Speech Technology 6, 73–81 (2003). https://doi.org/10.1023/A:1021056124145

Download citation

Issue Date: January 2003
DOI: https://doi.org/10.1023/A:1021056124145

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Metrical Model of Prosody for Multilingual TTS

Abstract

Access this article

Similar content being viewed by others

Natural language processing: state of the art, current trends and challenges

Natural language syntax complies with the free-energy principle

Re-examining selective adaptation: Fatiguing feature detectors, or distributional learning?

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

A Metrical Model of Prosody for Multilingual TTS

Abstract

Access this article

Similar content being viewed by others

Natural language processing: state of the art, current trends and challenges

Natural language syntax complies with the free-energy principle

Re-examining selective adaptation: Fatiguing feature detectors, or distributional learning?

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation