Abstract
In this paper the application of causal retro-causal neural networks (NN) to accent label prediction for speech synthesis is presented. Within the proposed NN architecture gating clusters are applied enabeling the dynamic adaptation of a network structure depending on the actual input to the NN. In the proposed causal retro-causal NN, gating clusters are used to adapt the network structure such that the network has a variable context length. This way only available input feature vectors from the actual context window are treated. The proposed NN architecture has been successfully applied for accent label prediction within our text-to-speech (TTS) system. Prediction accuracy ranges at 83%. This result ranges higher than results achieved with tree-based (CART) methods on a corpus with similar complexity.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
A. Batliner, M. Nutt, V. Warnke, E. Nöth, J. Buckow, R. Huber, and H. Niemann. Automatic annotation and classification of phrase accents in spontaneous speech. In Eurospeech, 1999.
Institut für Phonetik und sprachliche Kommunikation. Siemens synthese korpus-si1000p. corpus available at http://www.phonetik.uni-muenchen.de/Bas/.
Ralf Haury and Martin Holzapfel. Optimization of a neural network for speaker and task dependent f0-generation. In ICASSP, 1998.
Simon Haykin. Neural Networks — A Comprehensive Foundation, chapter 1.7 —Knowledge Representation. Prentice Hall International, 1999.
Julia Hirschberg. Pitch accent in context: Predicting prominence from text. Artificial Intelligence, 63:305–340, 1993.
Achim F. Müller, Hans G. Zimmermann, and R. Neuneier. Robust generation of symbolic prosody by a neural classifier based on autoassociators. In ICASSP, 2000.
K. Ross and M. Ostendorf. Prediction of abstract prosodic labels for speech synthesis. Computer Speech and Language, 10:155–185, 1996.
Christina Widera, Thomas Portele, and Maria Wolters. Prediction of word prominence. In Eurospeech, 1997.
Hans G. Zimmermann, R. Neuneier, and R. Grothmann. Modeling and Forecasting Financial Data, Techniques of Non-linear Dynamics, chapter Modeling of Dynamic Systems by Error Correction Neural Networks. Kluwer Academic, 2000.
Hans Georg Zimmermann, Achim F. Müller, Çağlayan Erdem, and Rüdiger Hoffmann. Prosody generation by causal retro-causal error correction neural networks. In Workshop on Multi-Lingual Speech Communication, Advanced Telecommunications Research Institute International (ATR), 2000.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Müller, A.F., Zimmermann, H.G. (2001). Symbolic Prosody Modeling by Causal Retro-causal NNs with Variable Context Length. In: Dorffner, G., Bischof, H., Hornik, K. (eds) Artificial Neural Networks — ICANN 2001. ICANN 2001. Lecture Notes in Computer Science, vol 2130. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44668-0_9
Download citation
DOI: https://doi.org/10.1007/3-540-44668-0_9
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42486-4
Online ISBN: 978-3-540-44668-2
eBook Packages: Springer Book Archive