Skip to main content

Symbolic Prosody Modeling by Causal Retro-causal NNs with Variable Context Length

  • Conference paper
  • First Online:
Artificial Neural Networks — ICANN 2001 (ICANN 2001)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2130))

Included in the following conference series:

  • 3189 Accesses

Abstract

In this paper the application of causal retro-causal neural networks (NN) to accent label prediction for speech synthesis is presented. Within the proposed NN architecture gating clusters are applied enabeling the dynamic adaptation of a network structure depending on the actual input to the NN. In the proposed causal retro-causal NN, gating clusters are used to adapt the network structure such that the network has a variable context length. This way only available input feature vectors from the actual context window are treated. The proposed NN architecture has been successfully applied for accent label prediction within our text-to-speech (TTS) system. Prediction accuracy ranges at 83%. This result ranges higher than results achieved with tree-based (CART) methods on a corpus with similar complexity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 189.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. A. Batliner, M. Nutt, V. Warnke, E. Nöth, J. Buckow, R. Huber, and H. Niemann. Automatic annotation and classification of phrase accents in spontaneous speech. In Eurospeech, 1999.

    Google Scholar 

  2. Institut für Phonetik und sprachliche Kommunikation. Siemens synthese korpus-si1000p. corpus available at http://www.phonetik.uni-muenchen.de/Bas/.

  3. Ralf Haury and Martin Holzapfel. Optimization of a neural network for speaker and task dependent f0-generation. In ICASSP, 1998.

    Google Scholar 

  4. Simon Haykin. Neural Networks — A Comprehensive Foundation, chapter 1.7 —Knowledge Representation. Prentice Hall International, 1999.

    Google Scholar 

  5. Julia Hirschberg. Pitch accent in context: Predicting prominence from text. Artificial Intelligence, 63:305–340, 1993.

    Article  Google Scholar 

  6. Achim F. Müller, Hans G. Zimmermann, and R. Neuneier. Robust generation of symbolic prosody by a neural classifier based on autoassociators. In ICASSP, 2000.

    Google Scholar 

  7. K. Ross and M. Ostendorf. Prediction of abstract prosodic labels for speech synthesis. Computer Speech and Language, 10:155–185, 1996.

    Article  Google Scholar 

  8. Christina Widera, Thomas Portele, and Maria Wolters. Prediction of word prominence. In Eurospeech, 1997.

    Google Scholar 

  9. Hans G. Zimmermann, R. Neuneier, and R. Grothmann. Modeling and Forecasting Financial Data, Techniques of Non-linear Dynamics, chapter Modeling of Dynamic Systems by Error Correction Neural Networks. Kluwer Academic, 2000.

    Google Scholar 

  10. Hans Georg Zimmermann, Achim F. Müller, Çağlayan Erdem, and Rüdiger Hoffmann. Prosody generation by causal retro-causal error correction neural networks. In Workshop on Multi-Lingual Speech Communication, Advanced Telecommunications Research Institute International (ATR), 2000.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Müller, A.F., Zimmermann, H.G. (2001). Symbolic Prosody Modeling by Causal Retro-causal NNs with Variable Context Length. In: Dorffner, G., Bischof, H., Hornik, K. (eds) Artificial Neural Networks — ICANN 2001. ICANN 2001. Lecture Notes in Computer Science, vol 2130. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44668-0_9

Download citation

  • DOI: https://doi.org/10.1007/3-540-44668-0_9

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-42486-4

  • Online ISBN: 978-3-540-44668-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics