Basic Research and Implementation Decisions for a Text-to-Speech Synthesis System in Romanian

Burileanu, Dragos

doi:10.1023/A:1020236605813

Basic Research and Implementation Decisions for a Text-to-Speech Synthesis System in Romanian

Published: September 2002

Volume 5, pages 211–225, (2002)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Dragos Burileanu¹

265 Accesses
14 Citations
Explore all metrics

Abstract

Speech synthesis is one of the most language-dependent domains of speech technology. In particular, the natural language processing stage of a text-to-speech (TTS) system contains the largest part of the linguistic knowledge for a given language. In this respect, one can state that building a high-quality TTS system for a new language involves many theoretical and technical challenges. Especially, extensive studies must exist (or be done) at the linguistic level, in order to endow the system with the most relevant language information; this requirement represents an essential condition to obtain a true naturalness of the synthesized speech, starting from unrestricted input text. This paper presents fundamental research and the related implementation issues in developing a complete TTS system in Romanian, emphasizing the language particularities and their influence on improving the language processing stage efficiency. The first section describes our standpoint on TTS synthesis as well as the overall architecture of our TTS system. The next sections formulate several important tasks of the natural language processing stage (input text preprocessing, letter-to-phone conversion, acoustic database preparation) and discuss the design philosophy of the corresponding modules, implementation decisions and evaluation experiments. A distinct section is devoted to an acoustic-phonetic study that assisted the phone-set selection and acoustic database generation. The paper ends with conclusions and a description of the work that is currently in progress at other levels of the TTS system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Handling Two Difficult Challenges for Text-to-Speech Synthesis Systems: Out-of-Vocabulary Words and Prosody: A Case Study in Romanian

Designing, Implementing and Testing the Acoustic Component of a Text to Speech System for the Romanian Language

An Overview of the ILSP Unit Selection Text-to-Speech Synthesis System

References

Ainsworth, W.A. and Pell, B. (1989). Connectionist architectures for a text-to-speech system. Proceedings of Eurospeech'89, Paris, France, pp. 125–128.
Beutnagel, M., Mohri, M., and Riley, M. (1999). Rapid unit selection from a large speech corpus for concatenative speech synthesis. Proceedings of Eurospeech'99, Budapest, Hungary, vol. 2, pp. 607–610.
Google Scholar
Boula de Mareüil, P., Yvon, F., D'Alessandro, C., Aubergé, V., Bagein, M., Bailly, G., Béchet, F., Foukia, S., Goldman, J.-P., Keller, E., O'shaughnessy, D., Pagel, V., Sannier, F., Véronis, J., and Zellner, B. (1998). Evaluation of grapheme-to-phoneme conversion for text-to-speech synthesis in French. Computer Speech and Languages, 12(4):393–410.
Google Scholar
Burileanu, D. (1999). Natural language processing for speech synthesis in Romanian language. Proceedings of the 12th International Conference on Control Systems and Computer Science (CSCS12), Bucharest, Romania, vol. 2, pp. 1–6.
Google Scholar
Burileanu, D., Sima, M., and Neagu, A. (1999a). A phonetic converter for speech synthesis in Romanian. Proceedings of the XIVth Congress on Phonetic Sciences (ICPhS), San Francisco, CA, vol. 1, pp. 503–506.
Google Scholar
Burileanu, D., Dan, C., Sima, M., and Burileanu, C. (1999b). A parser-based text preprocessor for Romanian language TTS synthesis. Proceedings of Eurospeech'99, Budapest, Hungary, vol. 5, pp. 2063–2066.
Google Scholar
Burileanu, D., Burileanu, C., and Neagu, A. (2000). Diphone database development for a Romanian language TTS system. “State-of-the-Art in Speech Synthesis”, London, pp. 9/1–9/6.
Campbell, N. and Black, A.W. (1997). Prosody and the selection of source units for concatenative synthesis. In J.P.H. van Santen, R.W. Sproat, J.P. Olive, and J. Hirschberg (Eds.), Progress in Speech Synthesis. New York: Springer-Verlag, pp. 279–292.
Google Scholar
Conkie, A.D. and Isard, S. (1997). Optimal coupling of diphones. In J.P.H. van Santen, R.W. Sproat, J.P. Olive, and J. Hirschberg (Eds.), Progress in Speech Synthesis. New York: Springer-Verlag, pp. 293–304.
Google Scholar
Daelemans, W.M.P. and van den Bosh, A.P.J. (1997). Languageindependent data-oriented grapheme-to-phoneme conversion. In J.P.H. van Santen, R.W. Sproat, J.P. Olive, and J. Hirschberg (Eds.), Progress in Speech Synthesis. New York: Springer-Verlag, pp. 77–89.
Google Scholar
D'Alessandro, C., Rizet, M.G., and Boula de Mareüil, P. (1996). Synthèse de la paroleà partir du texte. In H. Méloni (Ed.), Fondements et perspectives en traitement automatique de la parole. Aupelf-Uref, pp. 81–96.
Dutoit, T. (1997). An Introduction to Text-to-Speech Synthesis. Dordrecht: Kluwer.
Google Scholar
Ferri, G., Pierucci, P., and Sanzone, D. (1997). A complete linguistic analysis for an Italian text-to-speech system. In J.P.H. van Santen, R.W. Sproat, J.P. Olive, and J. Hirschberg (Eds.), Progress in Speech Synthesis. New York: Springer-Verlag, pp. 123–138.
Google Scholar
Gubbins, P.R. and Kurtis, K.M. (1995). Neural network solutions for improving English text-to-speech transcription. Proceedings of the International Conference on Phonetic Science, Stockholm, Sweden, pp. 314–317.
Jiang, L., Hon, H.W., and Huang, X. (1997). Improvements on a trainable letter-to-sound converter. Proceedings of Eurospeech'97, Rhodes, Greece, pp. 605–608.
Karaali O., Corrigan, G., Gerson, I., and Massey, N. (1997). Textto-speech conversion with neural networks: A recurrent TDNN approach. Proceedings of Eurospeech'97, Rhodes, Greece, pp. 561–564.
Klabbers, E. and Veldhuis, R. (2001). Reducing audible spectral discontinuities. IEEE Transactions on Speech and Audio Processing, 9(1):39–51.
Google Scholar
Liberman, M.Y. and Church, K.W. (1992). Text analysis and word pronunciation in text-to-speech systems. In S. Furui and M.M. Sondhi (Eds.), Advances in Speech Signal Processing. New York: Marcel Dekker, pp. 791–831.
Google Scholar
Lindstrom, A. and Ljungqvist, M. (1994). Text processing within a speech synthesis system. Proceedings of the ICSLP'94, Yokohama, Japan, pp. 139–142.
Rabiner, L. and Juang, B.H. (1993). Fundamentals of Speech Recognition. New Jersey: Prentice-Hall.
Google Scholar
Sejnowski, T.J. and Rosenberg, C.R. (1987). Parallel networks that learn to pronounce English text. Complex Systems, 1:145–168.
Google Scholar
Taylor, P. and Black, A.W. (1999). Speech synthesis by phonological structure matching. Proceedings of Eurospeech'99, Budapest, Hungary, vol. 2, pp. 623–626.
Google Scholar
Wells, J., Barry, W., Grice, M., Fourcin, A., and Gibbon, D. (1992). Standard Computer-Compatible Transcription. Esprit project 2589 (SAM), Doc. no. SAM-UCL-037. London: Phonetics and Linguistics Dept., UCL.
Google Scholar
Wouters, J. and Macon, M.W. (2001). Control of spectral dynamics in concatenative speech synthesis. IEEE Transactions on Speech and Audio Processing, 9(1):30–38.
Google Scholar

Download references

Author information

Authors and Affiliations

Speech Technology and Signal Processing Laboratory, Faculty of Electronics and Telecommunications, “Politehnica” University of Bucharest, Romania
Dragos Burileanu

Authors

Dragos Burileanu
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Burileanu, D. Basic Research and Implementation Decisions for a Text-to-Speech Synthesis System in Romanian. International Journal of Speech Technology 5, 211–225 (2002). https://doi.org/10.1023/A:1020236605813

Download citation

Issue Date: September 2002
DOI: https://doi.org/10.1023/A:1020236605813

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Basic Research and Implementation Decisions for a Text-to-Speech Synthesis System in Romanian

Abstract

Access this article

Similar content being viewed by others

Handling Two Difficult Challenges for Text-to-Speech Synthesis Systems: Out-of-Vocabulary Words and Prosody: A Case Study in Romanian

Designing, Implementing and Testing the Acoustic Component of a Text to Speech System for the Romanian Language

An Overview of the ILSP Unit Selection Text-to-Speech Synthesis System

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Basic Research and Implementation Decisions for a Text-to-Speech Synthesis System in Romanian

Abstract

Access this article

Similar content being viewed by others

Handling Two Difficult Challenges for Text-to-Speech Synthesis Systems: Out-of-Vocabulary Words and Prosody: A Case Study in Romanian

Designing, Implementing and Testing the Acoustic Component of a Text to Speech System for the Romanian Language

An Overview of the ILSP Unit Selection Text-to-Speech Synthesis System

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation