Expressive Speech Processing and Prosody Engineering: An Illustrated Essay on the Fragmented Nature of Real Interactive Speech

Campbell, Nick

doi:10.1007/978-0-387-73819-2_7

Nick Campbell³

1270 Accesses

Abstract

This chapter addresses the issue of expressive speech processing. It attempts to explain a mechanism for expressiveness in speech, and proposes a novel dimension of spoken language processing for speech technology applications, showing that although great progress has already been made, there is still much to be done before we can consider speech processing to be a truly mature technology.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

The Japan Science & Technology Agency. (2000-2005). Core Research for Evolutional Science & Technology.
Google Scholar
Campbell, N. (2007). On the use of nonverbal speech sounds in human communication. In: Verbal and Nonverbal Communication Behaviors, Berlin, Heidelberg, Springer, 2007, LNAI Vol. 4775, 117-128.
Google Scholar
Campbell, N., Mokhtari, P. (2003). Voice quality is the 4th prosodic parameter. In: Proc. 15th ICPhS, Barcelona, 203-206.
Google Scholar
Alku, P., Backstrom, T., Vilkman, E. (2002). Normalized amplitude quotient for parametriza- tion of the glottal flow. J Acoust Soc Am, 112(2), 701-710.
Article Google Scholar
Hanson, H. M. (1995). Glottal characteristics of female speakers. Ph.D. dissertation, Harvard University.
Google Scholar
Cahn, J. (1989). The generation of affect in synthesised speech. J. Am. Voice I/O Soc., 8, 251-256. SSML, The Speech Synthesis Markup Language, www.w3.org/TR/speech synthesis/
Google Scholar
Campbell, N. (2005). Getting to the heart of the matter; speech as expression of affect rather than just text or language, Lang. Res. Eval., 39 (1), 109-118.
Article Google Scholar
Calzolari, N. (2006). Introduction of the Conference Chair. In: Proc. 5th Int. Conf. on Language Resources and Evaluation, Genoa, I-IV.
Google Scholar
ICSI meeting corpus web page, http://www.icsi.berkeley.edu/speech/mr. As of May 2010.
Google Scholar
AMI: Augmented Multi-party Interaction (http://www.amiproject.org). As of May 2010.
Google Scholar
Schroeder, M. (2004). Dimensional emotion representation as a basis for speech synthesis with non-extreme emotions. In: Proc. Workshop on Affective Dialogue Systems: Lecture Notes in Computer Science, Kloster Irsee, Germany, 209-220.
Google Scholar
Campbell, N. (2006). Conversational Speech Synthesis and the need for some laughter. IEEE Transactions on Audio, Speech, and Language Processing, 14(4), 1171-1178.
Article Google Scholar
Cowie, R., Douglas-Cowie, E., Cox, C. (2005). Beyond emotion archetypes: Databases for emotion modeling using neural networks. Neural Netw., 18, 371-388
Article Google Scholar

Download references

Acknowledgments

This work is partly supported by the Ministry of Public Management, Home Affairs, Posts, and Telecommunications, Japan under the SCOPE funding initiative. The ESP corpus was collected over a period of 5 years with support from the Japan Science & Technology Corporation (JST/CREST) Core Research for Evolutional Science & Technology funding initiative. The author also wishes to thank the management of the Spoken Language Communication Research Laboratory and the Advanced Telecommunications Research Institute International for their continuing support and encouragement of this work. The chapter was written while the author was employed by NiCT, the National Institute of Information and Communications Technology. He is currently employed by Trinity College, the University of Dublin, Ireland, as Stokes Professor of Speech & Communication Technology.

Author information

Authors and Affiliations

Centre for Language and Communication Studies (CLCS), The University of Dublin, College Green, Dublin 2, Ireland
Nick Campbell

Authors

Nick Campbell
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nick Campbell .

Editor information

Editors and Affiliations

Department of Computing Science & Engineering, Chalmers University of Technology, 412 96, Göteborg, Sweden
Fang Chen
Department of Speech Sciences, University of Helsinki, 9, FIN-00014, Helsinki, Finland
Kristiina Jokinen

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Campbell, N. (2010). Expressive Speech Processing and Prosody Engineering: An Illustrated Essay on the Fragmented Nature of Real Interactive Speech. In: Chen, F., Jokinen, K. (eds) Speech Technology. Springer, New York, NY. https://doi.org/10.1007/978-0-387-73819-2_7

Download citation

DOI: https://doi.org/10.1007/978-0-387-73819-2_7
Published: 17 April 2010
Publisher Name: Springer, New York, NY
Print ISBN: 978-0-387-73818-5
Online ISBN: 978-0-387-73819-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics