Speech Under Stress: Analysis, Modeling and Recognition

Hansen, John H. L.; Patil, Sanjay

doi:10.1007/978-3-540-74200-5_6

John H. L. Hansen¹ &
Sanjay Patil¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4343))

2914 Accesses
57 Citations

Abstract

In this chapter, we consider a range of issues associated with analysis, modeling, and recognition of speech under stress. We start by defining stress, what could be perceived as stress, and how it affects the speech production system. In the discussion that follows, we explore how individuals differ in their perception of stress, and hence understand the cues associated with perceiving stress. Having considered the domains of stress, areas for speech analysis under stress, we shift to the development of algorithms to estimate, classify or distinguish different stress conditions. We will then conclude with revealing what might be in store for understanding stress, and the development of techniques to overcome the effects of stress for speech recognition and human-computer interactive systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Alm, C.O., Roth, D., Sproat, R.: Emotions from Text: Machine Learning for Textbased Emotion Prediction. In: Proceedings of HLT/EMNLP 2005, Vancouver (2005)
Google Scholar
Hollien, H.: Forensic Voice Identification. Academic Press, London (2002)
Google Scholar
Hansen, J.H.L.: Analysis and Compensation of Stressed and Noisy Speech with Application to Robust Automatic Recognition. PhD thesis, School of Electrical Engineering, Georgia Institute of Technology, Atlanta (1988)
Google Scholar
Simpson, C.A.: Speech Variability Effects on Recognition Accuracy Associated With Concurrent Task Performance by Pilots. Technical report, Psycho-Linguistic Research Associates (1985)
Google Scholar
Sproat, R., Olive, J.: Text-to-Speech Synthesis. In: Rabiner, L., Cox, R. (eds.) IEEE/CRC Press Handbook of Signal Processing, CRC Press, Cleveland (1997)
Google Scholar
Prahallad, K., Black, A., Mosur, R.: Sub-Phonetic Modeling for Capturing Pronunciation Variation in Conversational Speech Synthesis. In: ICASSP 2006. Proceedings of the 31th IEEE International Conference on Acoustics, Speech, and Signal Processing, Toulouse (2006)
Google Scholar
Ruzanski, E., Hansen, J.H.L., Meyerhoff, J., Saviolakis, G., Koenig, M.: Effect of phoneme characteristics on TEO Feature-based Automatic Stress Detection in Speech. In: ICASSP 2005. Proceedings of the 30th IEEE International Conference on Acoustics, Speech, and Signal Processing, Philadelphia, vol. 1, pp. 357–360 (2005)
Google Scholar
Rajasekaran, P.K., Doddington, G.R., Picone, J.W.: Recognition of Speech under Stress and in Noise. In: ICASSP 1986. Proceedings of the 11th IEEE International Conference on Acoustics, Speech, and Signal Processing, Tokyo, pp. 733–736 (1986)
Google Scholar
Cairns, D.A., Hansen, J.H.L.: Nonlinear Analysis and Detection of Speech under Stressed Conditions. Journal of the Acoustic Society of America 96(6), 3392–3400 (1994)
Article Google Scholar
Dharanipragada, S., Rao, B.D.: MVDR-based Feature Extraction for Robust Speech Recognition. In: ICASSP 2001. Proceedings of the 26th IEEE International Conference on Acoustics, Speech, and Signal Processing, Salt Lake City, pp. 309–312 (2001)
Google Scholar
Whittmore, J., Fisher, S.: Speech during Sustained Operations. Speech Communications 20, 55–70 (1996)
Article Google Scholar
Clary, G., Hansen, J.H.L.: A Novel Speech Recognizer for Keyword Spotting. In: Proceedings of the International Conference of Spoken Language Processing (ICSLP ’02), Alberta, vol. 1, pp. 13–16 (1992)
Google Scholar
Hansen, J.H.L., Bou-Ghazale, S.E.: Duration and Spectral Based Stress Token Generation for Keyword Recognition under Hidden Markov Models. IEEE Transactions on Speech & Audio Processing 3(5), 415–421 (1995)
Article Google Scholar
Junqua, J.C.: The Lombard Reflex and its Role on Human Listeners and Automatic Speech Recognition. Journal of the Acoustic Society of America 93(1), 510–524 (1993)
Article Google Scholar
Junqua, J.C.: The Influence of Acoustics on Speech Production: a Noise-Induced Stress Phenomenon known as the Lombard Effect. Speech Communication 20, 13–22 (1996)
Article Google Scholar
Hicks, J.W., Hollien, H.: The Reflection of Stress in Voice-1: Understanding the Basic Correlates. In: Proceedings of the 1991 Carnahan Conference on Crime Countermeasures, pp. 189–195 (1981)
Google Scholar
Hansen, J.H.L., Swail, C., South, A.J., Moore, R.K., Steeneken, H., Cupples, E.J., Anderson, T., Vloeberghs, C.R.A., Trancoso, I., Verlinde, P.: The Impact of Speech Under ’Stress’ on Military Speech Technology. In: NATO RTO-TR-10, AC/323(IST)TP/5 IST/TG-01 (2000)
Google Scholar
Murray, I.R., Baber, C., South, A.: Towards a Definition and Working Model of Stress and its Effects on Speech. Speech Communication 20, 3–12 (1996)
Article Google Scholar
Goldberger, L., Breznitz, S.: Handbook of Stress: Theoretical and Clinical Aspects. Free Press, MacMilliam Pub., New York (1982)
Google Scholar
Schreuder, M.J.: Prosodic Processes in Language and Music. PhD thesis, University of Groningen (2006)
Google Scholar
Hansen, J.H.L.: Evaluation of Acoustic Correlates of Speech Under Stress for Robust Speech Recognition. In: IEEE Proceedings of the 15th Northeast Bioengineering Conference, Boston, pp. 31–32 (1989)
Google Scholar
Paul, D.B.: A Speaker-Stress Resistant HMM Isolated Word Recognizer. In: Proceedings of the 12th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP ’87), Dallas, pp. 713–716 (1987)
Google Scholar
Pickett, J.M.: The Sound of Speech Communication. University Park Press, Baltimore (1980)
Google Scholar
Williams, C.E., Stevens, K.N.: Emotions and Speech: Some Acoustic Correlates. Journal of the Acoustic Society of America 52(4), 1238–1250 (1972)
Article Google Scholar
Hansen, J.H.L.: Analysis and Compensation of Speech under Stress and Noise for Environmental Robustness in Speech Recognition. Speech Communications, Special Issue on Speech Under Stress 20(2), 151–170 (1996)
Google Scholar
Van Santen, J.: Prosodic modeling in Text-to-Speech Synthesis. In: Proceedings of the European Conference on Speech Communication and Technology (Eurospeech ’97), Rhodes, Greece, pp. 19–28 (1997)
Google Scholar
Hansen, J.H.L.: Adaptive Source Generator Compensation and Enhancement for Speech Recognition in Noisy Stressful Environments. In: Proceedings of the 18th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP ’93), Minn., pp. 95–98 (1993)
Google Scholar
Hecker, M.H.L., Stevens, K.N., von Bismark, G., Williams, C.E.: Manifestations of Task Induced Stress in the Acoustic Speech Signal. Journal of the Acoustic Society of America 44, 993–1001 (1968)
Article Google Scholar
Hansen, J.H.L., Cairns, D.A.: ICARUS: Source Generator based Real-Time Recognition of Speech in Noisy Stressful and Lombard Effect Environments. Speech Communications 16(4), 391–422 (1995)
Article Google Scholar
Hansen, J.H.L., Womack, B.: Feature Analysis and Neural Network based Classification of Speech under Stress. IEEE Transactions on Speech & Audio Processing 4(4), 307–313 (1996)
Article Google Scholar
Womack, B.D., Hansen, J.H.L.: Classification of Speech Under Stress using Target Driven Features. Speech Communication, Special Issue on Speech Under Stress 20(1), 131–150 (1996)
Google Scholar
Bou-Ghazale, S.E., Hansen, J.H.L.: Stressed Speech Synthesis Based on a Modified CELP Vocoder Framework. Speech Communications: Special Issue on Speech Under Stress 20(2), 93–110 (1996)
Google Scholar
Hansen, J.H.L.: Morphological Constrained Enhancement with Adaptive Cepstral Compensation (MCE-ACC) for Speech Recognition in Noise and Lombard Effect. IEEE Transactions on Speech & Audio Proc (SPECIAL ISSUE: Robust Speech Recognition) 2(4), 598–614 (1994)
Google Scholar
Hansen, J.H.L., Bria, O.N.: Lombard Effect Compensation for Robust Automatic Speech Recognition in Noise. In: Proceedings of the International Conference on Spoken Language Processing (ICLSP ’90), Kobe, Japan, pp. 1125–1128 (1990)
Google Scholar
Yapanel, U.H., Hansen, J.H.L.: A New Perspective on Feature Extraction for Robust In-Vehicle Speech Recognition. In: Proceedings of the 8th European Conference on Speech Communication and Technology (Eurospeech ’03), Geneva, Switzerland, pp. 1281–1284 (2003)
Google Scholar
Bou-Ghazale, S.E., Hansen, J.H.L.: A Comparative Study of Traditional and Newly Proposed Features for Recognition of Speech Under Stress. IEEE Transactions on Speech & Audio Processing 8(4), 429–442 (2000)
Article Google Scholar
Hansen, J.H.L., Clements, M.A.: Constrained Iterative Speech Enhancement with Application to Speech Recognition. IEEE Transactions on Signal Processing 39(4), 795–805 (1991)
Article Google Scholar
Clary, G., Hansen, J.H.L.: Feature Enhancement for Multi-layer Perceptron and Semi-Continuous Hidden Markov Model Based Classifiers using Neural Networks. In: Neural and Stochastic Methods in Image and Signal Processing, Proceedings of the SPIE, vol. 1766, pp. 529–540 (1992)
Google Scholar
Cestaro, V.L.: A Comparison between Decision Accuracy Rates obtained using the Polygraph Instrument and Computer Voice Stress Analyzer (CVSA) in the absence of Jeopardy. Technical report, DOD Polygraph Inst. (1995)
Google Scholar
Eriksson, A., Drygajlo, A.: Forsensic Speech Science. In: Tutorial, 9th European Conference on Speech Communication and Technology (Interspeech 05 - Eurospeech) (2005)
Google Scholar
Zhou, G.: Nonlinear Speech Analysis and Acoustic Model Adaptation with Applications to Stress Classification and Speech Recognition. PhD thesis, Dept. of Electrical and Computer Eng., Duke University (1999)
Google Scholar
Zhou, G., Hansen, J.H.L., Kaiser, J.: Linear and Nonlinear Speech Feature Analysis for Stress Classification. In: Proceedings of the International Conference on Spoken Language Processing (ICLSP ’98), Sydney, Australia, vol. 3, pp. 883–886 (1998)
Google Scholar
Zhou, G., Hansen, J.H.L., Kaiser, J.F.: Classification of Speech under Stress Based on Features Derived from the Nonlinear Teager Energy Operator. In: Proceedings of the 23th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP ’98), Seattle, pp. 549–552 (1998)
Google Scholar
Womack, B.D., Hansen, J.H.L.: N-Channel Hidden Markov Models for Combined Stress Speech Classification and Recognition. IEEE Transactions on Speech and Audio Processing 7(6), 668–677 (1999)
Article Google Scholar
Kaiser, J.F.: Some Observations on Vocal Tract Operation from a Fluid Flow Point of View. In: Titze, I.R., Scherer, R.C. (eds.) Vocal Fold Physiology: Biomechanics, Acoustics, and Phonatory Control. Denver Center for the Performing Arts, Denver, pp. 358–386 (1983)
Google Scholar
Teager, H.M.: Some Observations on Oral Air Flow during Phonation. IEEE Transactions Acoustic, Speech, Signal Processing 28(5), 599–601 (1980)
Article Google Scholar
Teager, H.M., Teager, S.M.: A Phenomenological Model for Vowel Production in the Vocal Tract. In: Speech Science: Recent Advances, pp. 72–100 (1982)
Google Scholar
Teager, H.M., Teager, S.: Evidence for Nonlinear Production Mechanisms in the Vocal Tract. In: NATO Advanced Study Inst. On Speech Production and Speech Modeling, Bonas, France, vol. 55, pp. 241–261. Kluwer Academic Publishers, Boston (1989)
Google Scholar
Thomas, T.J.: A Finite Element Model of Fluid Flow in the Vocal Tract. Computer Speech Language 1, 131–151 (1986)
Article Google Scholar
Hansen, J.H.L., Gavidia-Ceballos, L., Kaiser, J.F.: A Nonlinear based Speech Feature Analysis Method with Application to Vocal Fold Pathology Assessment. IEEE Transactions on Biomedical Engineering 45(3), 300–313 (1998)
Article Google Scholar
Zhou, G., Hansen, J.H.L., Kaiser, J.F.: Nonlinear Feature Based Classification of Speech under Stress. IEEE Transactions on Speech & Audio Processing 9, 201–216 (2001)
Article Google Scholar
Rahurkar, M., Hansen, J.H.L., Meyerhoff, J., Saviolakis, G., Koenig, M.: Frequency Band Analysis for Stress Detection Using a Teager Energy Operator Based Feature. In: Proceedings of the International Conference of Spoken Language Processing (ICSLP ’02), Denver, vol. 3, pp. 2021–2024 (2002)
Google Scholar
Ruzanski, E., Hansen, J.H.L., Meyerhoff, J., et al.: Stress Level Classification of Speech using Euclidean Distance Metrics in a Novel Hybrid Multi-Dimensional Feature Space. In: Proceedings of the 31st IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP ’06), Toulouse, vol. 1, pp. I–425–I–428 (2006)
Google Scholar
Bou-Ghazale, S.E.: Analysis, Modeling, and Perturbation of Speech Under Stress with Applications to Synthesis and Recognition. PhD thesis, Robust Speech Processing Laboratory, Duke Univ. Dept. of Electrical Engineering (1996)
Google Scholar
Bou-Ghazale, S.E., Hansen, J.H.L.: Stress Perturbation of Neutral Speech for Synthesis based on Hidden Markov Models. IEEE Transactions on Speech & Audio Processing 6(3), 201–216 (1998)
Article Google Scholar
Cahn, J.: The Generation of Affect in Synthesized Speech. Journal of the American Voice I/O Society 8, 1–19 (1990)
Google Scholar
Hansen, J.H.L., Clements, M.A.: Evaluation of Speech under Stress and Emotional Conditions. 82(S1), 7–8 (1987)
Google Scholar
Murray, I.R., Arnott, J.L.: Implementation and Testing of a System for Producing Emotion-by-Rule in Synthetic Speech. Speech Communication 16, 369–390 (1995)
Article Google Scholar
Murray, I.R., Arnott, J.L.: Synthesizing Emotions in Speech: is it time to get excited? In: Proceedings of the 4th International Conference on Spoken Language Processing (ICLSP ’96), vol. 3, pp. 1816–1819. Philadelphia (1996)
Google Scholar
Black, A.: Multilingual Speech Synthesis. In: Schultz, T., Kirchhoff, K. (eds.) Multilingual Speech Processing. Elsevier, Academic Press (2006)
Google Scholar
Picard, R.W., Klein, J.: Computers that Recognize and Respond to User Emotion: Theoretical and Practical Implications. Interacting with Computers 14(2), 141–169 (2002)
Article Google Scholar
Sproat, R. (ed.): Multilingual Text-to-Speech Synthesis: The Bell Labs Approach. Kluwer Academic Publishers, Boston (1997)
Google Scholar
Van Santen, J., Kain, A., Klabbers, E.: Synthesis by Recombination of Segmental and Prosodic information. In: Proceedings of the International Conference on Speech Prosody, Japan, pp. 409–412 (2004)
Google Scholar
Bachrach, A.J.: Speech and its Potential for Stress Monitoring: Monitoring Vital Signs in the Divers. Technical report, Naval Medical Research Institute (1979)
Google Scholar
Chen, Y.: Cepstral Domain Talker Stress Compensation for Robust Speech Recognition. IEEE Transactions on Acoustic Speech Signal Process. 36, 433–439 (1988)
Article MATH Google Scholar
Darby, J.K.: Speech Evaluation in Psychiatry. Grune and Stratton, New York (1981)
Google Scholar
Flack, M.: Flying Stress. Medical Research Committee, London (1918)
Google Scholar
Hansen, J.H.L.: Analysis and Compensation of Noisy Stressful Speech for Environmental Robustness in Speech Recognition (invited tutorial). In: NATO-ESCA Proc. Inter. Tutorial & Research Workshop on Speech Under Stress, Lisbon, Portugal, pp. 91–98 (1995)
Google Scholar
Hansen, J.H.L., Bou-Ghazale, S.E.: Getting Started with SUSAS: A Speech Under Simulated and Actual Stress Database. In: Proceedings of the European Conference on Speech Communication and Technology (Eurospeech ’97), vol. 4, pp. 1743–1746. Rhodes, Greece (1997)
Google Scholar
Hansen, J.H.L., Mammone, R., Young, S.: Editorial for the special issue: Robust Speech Recognition. IEEE transactions on Speech & Audio Processing 2(4), 549–550 (1994)
Google Scholar
Hansen, J.H.L., Gavidia-Ceballos, L., Kaiser, J.F.: A Nonlinear based Speech Feature Analysis Method with Application to Vocal Fold Pathology Assessment. IEEE Transactions on Biomedical Engineering 45(3), 300–313 (1998)
Article Google Scholar
Hollien, H., Hicks, J.W.: The Reflection of Stress in Voice-2: the Special Case of Psychological Stress Evaluators. In: Proceedings of the 1991 Carnahan Conference on Crime Countermeasures, pp. 196–197 (1991)
Google Scholar
House, A.S.: On Vowel Duration in English. Journal of the Acoustic Society of America 33(9), 1174–1178 (1962)
Article Google Scholar
Kuroda, I., Fujiwara, O., Okamura, N., Utsuki, N.: Method for Determining Pilot Stress Through Analysis of Voice Communications. In: Aviation, Space, and Environmental Medicine 528–533 (1976)
Google Scholar
Kaiser, J.F.: Some Useful Properties of Teager’s Energy operator. In: Proceedings of the 18th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP ’93), Minn., vol. 3, pp. 149–152 (1993)
Google Scholar
Kaiser, J.F.: On a Simple Algorithm to Calculate the Energy of a Signal. In: Proceedings of the 15th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP ’90), Albuquerque, New Mexico, pp. 381–384 (1990)
Google Scholar
McGilloway, S., Cowie, R., Douglas-Cowie, E., Gielen, S., Westerdijk, M., Stroeve, S.: Approaching Automatic Recognition of Emotion from Voice: A rough Benchmark. In: Proceedings of the ISCA Workshop on Speech and Emotion, Belfast (2000)
Google Scholar
Malkin, F.J., Christ, K.A.: Human Factors Engineering Assessment of Voice Technology for the Light Helicopter Family. Technical Report I-20, U. S. Armu Human Engineering Lab. (June 1985)
Google Scholar
Maragos, P., Kaiser, J.F., Quatieri, T.F.: On Amplitude and Frequency Demodulation using Energy Operators. IEEE Transactions on Signal Processing 41, 1532–1550 (1993)
Article MATH Google Scholar
Poock, G.K., Armstrong, J.W.: Effect of Operator Mental Loading on Voice Recognition System Performance. Technical report, Naval Postgraduate School (1981)
Google Scholar
Poock, G.K., Armstrong, J.W.: Effect of Task Duration on Voice Recognition System Performance. Technical report, Naval Postgraduate School (September 1981)
Google Scholar
Schreuder, M., Eerten, L.v., Gilbers, D.: Music as a Method of Identifying Emotional Speech. In: Proceedings of the Workshop on Corpora for Research on Emotion and Affect (LRE ’06), Genua, Italy, pp. 55–59 (2006)
Google Scholar
Simonov, P.V., Frolov, M.V.: Analysis of the Human Voice as a Method of Controlling Emotional State: Achievements and Goals. Aviation, Space, and Environmental Sciences, pp. 23–25 (1977)
Google Scholar
Streeter, L.A., MacDonald, N.H., Apple, W., Krauss, R.M., Galotti, K.M.: Acoustic and Perceptual Indicators of Emotional Stress. Journal of the Acoustic Society of America 73(3), 917–928 (1988)
Google Scholar
Varadarajan, V., Hansen, J.H.L., Ikeno, A.: UT-SCOPE - A corpus for Speech under Cognitive/Physical Task Stress and Emotion. In: LREC 2006. Workshop on Corpora for Research on Emotion and Affect, pp. 72–75 (2006)
Google Scholar
Varadarajan, V., Hansen, J.H.L.: Analysis of Lombard effect under Different types and levels of Noise with Application to In-set Speaker ID systems. In: Interspeech 2006 –ICSLP. Proceedings of the 9th International Conference on Spoken Language Processing, Pittsburgh (2006)
Google Scholar
Womack, B., Hansen, J.H.L.: Robust Speech Recognition via Speaker Stress Classification. In: ICASSP 2006. Proceedings of the 31th IEEE International Conference on Acoustics, Speech, and Signal Processing, Toulouse, vol. 1, pp. 53–56 (2006)
Google Scholar
Yamada, T., Hashimoto, H., Tosa, N.: Pattern Recognition of Emotion with Neutral Network. In: IECON 1995. Proc. 21st Inter. Conf. on Industrial Electronics, Control, and Instrumentation, vol. 1, pp. 183–187 (1995)
Google Scholar
Yapanel, U.H., Dharanipragada, S.: Perceptual MVDR-based Cepstral Coefficients for Noise Robust Speech Recognition. In: ICASSP 2003. Proceedings of the 28th IEEE International Conference on Acoustics, Speech, and Signal Processing, Hong-Kong (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Center for Robust Speech Systems, University of Texas at Dallas, Richardson, TX-75080, USA
John H. L. Hansen & Sanjay Patil

Authors

John H. L. Hansen
View author publications
You can also search for this author in PubMed Google Scholar
Sanjay Patil
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Christian Müller

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Hansen, J.H.L., Patil, S. (2007). Speech Under Stress: Analysis, Modeling and Recognition. In: Müller, C. (eds) Speaker Classification I. Lecture Notes in Computer Science(), vol 4343. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74200-5_6

Download citation

DOI: https://doi.org/10.1007/978-3-540-74200-5_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74186-2
Online ISBN: 978-3-540-74200-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics