Symbolic Prosody Modeling by Causal Retro-causal NNs with Variable Context Length

Müller, Achim F.; Zimmermann, Hans Georg

doi:10.1007/3-540-44668-0_9

Achim F. Müller⁷ &
Hans Georg Zimmermann⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2130))

Included in the following conference series:

International Conference on Artificial Neural Networks

3189 Accesses

Abstract

In this paper the application of causal retro-causal neural networks (NN) to accent label prediction for speech synthesis is presented. Within the proposed NN architecture gating clusters are applied enabeling the dynamic adaptation of a network structure depending on the actual input to the NN. In the proposed causal retro-causal NN, gating clusters are used to adapt the network structure such that the network has a variable context length. This way only available input feature vectors from the actual context window are treated. The proposed NN architecture has been successfully applied for accent label prediction within our text-to-speech (TTS) system. Prediction accuracy ranges at 83%. This result ranges higher than results achieved with tree-based (CART) methods on a corpus with similar complexity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 189.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Phonetics and Machine Learning: Hierarchical Modelling of Prosody in Statistical Speech Synthesis

Algorithms for Automatic Accentuation and Transcription of Russian Texts in Speech Recognition Systems

Punctuation-generation-inspired linguistic features for Mandarin prosody generation

Article Open access 21 February 2019

References

A. Batliner, M. Nutt, V. Warnke, E. Nöth, J. Buckow, R. Huber, and H. Niemann. Automatic annotation and classification of phrase accents in spontaneous speech. In Eurospeech, 1999.
Google Scholar
Institut für Phonetik und sprachliche Kommunikation. Siemens synthese korpus-si1000p. corpus available at http://www.phonetik.uni-muenchen.de/Bas/.
Ralf Haury and Martin Holzapfel. Optimization of a neural network for speaker and task dependent f0-generation. In ICASSP, 1998.
Google Scholar
Simon Haykin. Neural Networks — A Comprehensive Foundation, chapter 1.7 —Knowledge Representation. Prentice Hall International, 1999.
Google Scholar
Julia Hirschberg. Pitch accent in context: Predicting prominence from text. Artificial Intelligence, 63:305–340, 1993.
Article Google Scholar
Achim F. Müller, Hans G. Zimmermann, and R. Neuneier. Robust generation of symbolic prosody by a neural classifier based on autoassociators. In ICASSP, 2000.
Google Scholar
K. Ross and M. Ostendorf. Prediction of abstract prosodic labels for speech synthesis. Computer Speech and Language, 10:155–185, 1996.
Article Google Scholar
Christina Widera, Thomas Portele, and Maria Wolters. Prediction of word prominence. In Eurospeech, 1997.
Google Scholar
Hans G. Zimmermann, R. Neuneier, and R. Grothmann. Modeling and Forecasting Financial Data, Techniques of Non-linear Dynamics, chapter Modeling of Dynamic Systems by Error Correction Neural Networks. Kluwer Academic, 2000.
Google Scholar
Hans Georg Zimmermann, Achim F. Müller, Çağlayan Erdem, and Rüdiger Hoffmann. Prosody generation by causal retro-causal error correction neural networks. In Workshop on Multi-Lingual Speech Communication, Advanced Telecommunications Research Institute International (ATR), 2000.
Google Scholar

Download references

Author information

Authors and Affiliations

Siemens Corporate Technology, Otto-Hahn-Ring 6, D-81739, Munich, Germany
Achim F. Müller & Hans Georg Zimmermann

Authors

Achim F. Müller
View author publications
You can also search for this author in PubMed Google Scholar
Hans Georg Zimmermann
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Mecidal Cybernetics and Artificial Intelligence, University of Vienna, Freyung 6/2, 1010, Vienna, Austria
Georg Dorffner
Institute for Computer Aided Automation Pattern Recognition and Image Processing Group, Technical University of Vienna, Favoritenstr. 9/1832, 1040, Vienna, Austria
Horst Bischof
Institut für Statistik, Wirtschaftsuniversität Wien, Augasse 2-6, 1090, Wien, Austria
Kurt Hornik

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Müller, A.F., Zimmermann, H.G. (2001). Symbolic Prosody Modeling by Causal Retro-causal NNs with Variable Context Length. In: Dorffner, G., Bischof, H., Hornik, K. (eds) Artificial Neural Networks — ICANN 2001. ICANN 2001. Lecture Notes in Computer Science, vol 2130. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44668-0_9

Download citation

DOI: https://doi.org/10.1007/3-540-44668-0_9
Published: 17 August 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42486-4
Online ISBN: 978-3-540-44668-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Symbolic Prosody Modeling by Causal Retro-causal NNs with Variable Context Length

Abstract

Access this chapter

Preview

Similar content being viewed by others

Phonetics and Machine Learning: Hierarchical Modelling of Prosody in Statistical Speech Synthesis

Algorithms for Automatic Accentuation and Transcription of Russian Texts in Speech Recognition Systems

Punctuation-generation-inspired linguistic features for Mandarin prosody generation

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Symbolic Prosody Modeling by Causal Retro-causal NNs with Variable Context Length

Abstract

Access this chapter

Preview

Similar content being viewed by others

Phonetics and Machine Learning: Hierarchical Modelling of Prosody in Statistical Speech Synthesis

Algorithms for Automatic Accentuation and Transcription of Russian Texts in Speech Recognition Systems

Punctuation-generation-inspired linguistic features for Mandarin prosody generation

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation