Skip to main content

Expressive Speech Processing and Prosody Engineering: An Illustrated Essay on the Fragmented Nature of Real Interactive Speech

  • Chapter
  • First Online:
Speech Technology
  • 1270 Accesses

Abstract

This chapter addresses the issue of expressive speech processing. It attempts to explain a mechanism for expressiveness in speech, and proposes a novel dimension of spoken language processing for speech technology applications, showing that although great progress has already been made, there is still much to be done before we can consider speech processing to be a truly mature technology.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. The Japan Science & Technology Agency. (2000-2005). Core Research for Evolutional Science & Technology.

    Google Scholar 

  2. Campbell, N. (2007). On the use of nonverbal speech sounds in human communication. In: Verbal and Nonverbal Communication Behaviors, Berlin, Heidelberg, Springer, 2007, LNAI Vol. 4775, 117-128.

    Google Scholar 

  3. Campbell, N., Mokhtari, P. (2003). Voice quality is the 4th prosodic parameter. In: Proc. 15th ICPhS, Barcelona, 203-206.

    Google Scholar 

  4. Alku, P., Backstrom, T., Vilkman, E. (2002). Normalized amplitude quotient for parametriza- tion of the glottal flow. J Acoust Soc Am, 112(2), 701-710.

    Article  Google Scholar 

  5. Hanson, H. M. (1995). Glottal characteristics of female speakers. Ph.D. dissertation, Harvard University.

    Google Scholar 

  6. Cahn, J. (1989). The generation of affect in synthesised speech. J. Am. Voice I/O Soc., 8, 251-256. SSML, The Speech Synthesis Markup Language, www.w3.org/TR/speech synthesis/

    Google Scholar 

  7. Campbell, N. (2005). Getting to the heart of the matter; speech as expression of affect rather than just text or language, Lang. Res. Eval., 39 (1), 109-118.

    Article  Google Scholar 

  8. Calzolari, N. (2006). Introduction of the Conference Chair. In: Proc. 5th Int. Conf. on Language Resources and Evaluation, Genoa, I-IV.

    Google Scholar 

  9. ICSI meeting corpus web page, http://www.icsi.berkeley.edu/speech/mr. As of May 2010.

    Google Scholar 

  10. AMI: Augmented Multi-party Interaction (http://www.amiproject.org). As of May 2010.

    Google Scholar 

  11. Schroeder, M. (2004). Dimensional emotion representation as a basis for speech synthesis with non-extreme emotions. In: Proc. Workshop on Affective Dialogue Systems: Lecture Notes in Computer Science, Kloster Irsee, Germany, 209-220.

    Google Scholar 

  12. Campbell, N. (2006). Conversational Speech Synthesis and the need for some laughter. IEEE Transactions on Audio, Speech, and Language Processing, 14(4), 1171-1178.

    Article  Google Scholar 

  13. Cowie, R., Douglas-Cowie, E., Cox, C. (2005). Beyond emotion archetypes: Databases for emotion modeling using neural networks. Neural Netw., 18, 371-388

    Article  Google Scholar 

Download references

Acknowledgments

This work is partly supported by the Ministry of Public Management, Home Affairs, Posts, and Telecommunications, Japan under the SCOPE funding initiative. The ESP corpus was collected over a period of 5 years with support from the Japan Science & Technology Corporation (JST/CREST) Core Research for Evolutional Science & Technology funding initiative. The author also wishes to thank the management of the Spoken Language Communication Research Laboratory and the Advanced Telecommunications Research Institute International for their continuing support and encouragement of this work. The chapter was written while the author was employed by NiCT, the National Institute of Information and Communications Technology. He is currently employed by Trinity College, the University of Dublin, Ireland, as Stokes Professor of Speech & Communication Technology.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nick Campbell .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Campbell, N. (2010). Expressive Speech Processing and Prosody Engineering: An Illustrated Essay on the Fragmented Nature of Real Interactive Speech. In: Chen, F., Jokinen, K. (eds) Speech Technology. Springer, New York, NY. https://doi.org/10.1007/978-0-387-73819-2_7

Download citation

  • DOI: https://doi.org/10.1007/978-0-387-73819-2_7

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-0-387-73818-5

  • Online ISBN: 978-0-387-73819-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics