Abstract
This chapter addresses the issue of expressive speech processing. It attempts to explain a mechanism for expressiveness in speech, and proposes a novel dimension of spoken language processing for speech technology applications, showing that although great progress has already been made, there is still much to be done before we can consider speech processing to be a truly mature technology.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
The Japan Science & Technology Agency. (2000-2005). Core Research for Evolutional Science & Technology.
Campbell, N. (2007). On the use of nonverbal speech sounds in human communication. In: Verbal and Nonverbal Communication Behaviors, Berlin, Heidelberg, Springer, 2007, LNAI Vol. 4775, 117-128.
Campbell, N., Mokhtari, P. (2003). Voice quality is the 4th prosodic parameter. In: Proc. 15th ICPhS, Barcelona, 203-206.
Alku, P., Backstrom, T., Vilkman, E. (2002). Normalized amplitude quotient for parametriza- tion of the glottal flow. J Acoust Soc Am, 112(2), 701-710.
Hanson, H. M. (1995). Glottal characteristics of female speakers. Ph.D. dissertation, Harvard University.
Cahn, J. (1989). The generation of affect in synthesised speech. J. Am. Voice I/O Soc., 8, 251-256. SSML, The Speech Synthesis Markup Language, www.w3.org/TR/speech synthesis/
Campbell, N. (2005). Getting to the heart of the matter; speech as expression of affect rather than just text or language, Lang. Res. Eval., 39 (1), 109-118.
Calzolari, N. (2006). Introduction of the Conference Chair. In: Proc. 5th Int. Conf. on Language Resources and Evaluation, Genoa, I-IV.
ICSI meeting corpus web page, http://www.icsi.berkeley.edu/speech/mr. As of May 2010.
AMI: Augmented Multi-party Interaction (http://www.amiproject.org). As of May 2010.
Schroeder, M. (2004). Dimensional emotion representation as a basis for speech synthesis with non-extreme emotions. In: Proc. Workshop on Affective Dialogue Systems: Lecture Notes in Computer Science, Kloster Irsee, Germany, 209-220.
Campbell, N. (2006). Conversational Speech Synthesis and the need for some laughter. IEEE Transactions on Audio, Speech, and Language Processing, 14(4), 1171-1178.
Cowie, R., Douglas-Cowie, E., Cox, C. (2005). Beyond emotion archetypes: Databases for emotion modeling using neural networks. Neural Netw., 18, 371-388
Acknowledgments
This work is partly supported by the Ministry of Public Management, Home Affairs, Posts, and Telecommunications, Japan under the SCOPE funding initiative. The ESP corpus was collected over a period of 5 years with support from the Japan Science & Technology Corporation (JST/CREST) Core Research for Evolutional Science & Technology funding initiative. The author also wishes to thank the management of the Spoken Language Communication Research Laboratory and the Advanced Telecommunications Research Institute International for their continuing support and encouragement of this work. The chapter was written while the author was employed by NiCT, the National Institute of Information and Communications Technology. He is currently employed by Trinity College, the University of Dublin, Ireland, as Stokes Professor of Speech & Communication Technology.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Campbell, N. (2010). Expressive Speech Processing and Prosody Engineering: An Illustrated Essay on the Fragmented Nature of Real Interactive Speech. In: Chen, F., Jokinen, K. (eds) Speech Technology. Springer, New York, NY. https://doi.org/10.1007/978-0-387-73819-2_7
Download citation
DOI: https://doi.org/10.1007/978-0-387-73819-2_7
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-0-387-73818-5
Online ISBN: 978-0-387-73819-2
eBook Packages: EngineeringEngineering (R0)