Skip to main content

Speech Under Stress: Analysis, Modeling and Recognition

  • Chapter
Speaker Classification I

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4343))

Abstract

In this chapter, we consider a range of issues associated with analysis, modeling, and recognition of speech under stress. We start by defining stress, what could be perceived as stress, and how it affects the speech production system. In the discussion that follows, we explore how individuals differ in their perception of stress, and hence understand the cues associated with perceiving stress. Having considered the domains of stress, areas for speech analysis under stress, we shift to the development of algorithms to estimate, classify or distinguish different stress conditions. We will then conclude with revealing what might be in store for understanding stress, and the development of techniques to overcome the effects of stress for speech recognition and human-computer interactive systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alm, C.O., Roth, D., Sproat, R.: Emotions from Text: Machine Learning for Textbased Emotion Prediction. In: Proceedings of HLT/EMNLP 2005, Vancouver (2005)

    Google Scholar 

  2. Hollien, H.: Forensic Voice Identification. Academic Press, London (2002)

    Google Scholar 

  3. Hansen, J.H.L.: Analysis and Compensation of Stressed and Noisy Speech with Application to Robust Automatic Recognition. PhD thesis, School of Electrical Engineering, Georgia Institute of Technology, Atlanta (1988)

    Google Scholar 

  4. Simpson, C.A.: Speech Variability Effects on Recognition Accuracy Associated With Concurrent Task Performance by Pilots. Technical report, Psycho-Linguistic Research Associates (1985)

    Google Scholar 

  5. Sproat, R., Olive, J.: Text-to-Speech Synthesis. In: Rabiner, L., Cox, R. (eds.) IEEE/CRC Press Handbook of Signal Processing, CRC Press, Cleveland (1997)

    Google Scholar 

  6. Prahallad, K., Black, A., Mosur, R.: Sub-Phonetic Modeling for Capturing Pronunciation Variation in Conversational Speech Synthesis. In: ICASSP 2006. Proceedings of the 31th IEEE International Conference on Acoustics, Speech, and Signal Processing, Toulouse (2006)

    Google Scholar 

  7. Ruzanski, E., Hansen, J.H.L., Meyerhoff, J., Saviolakis, G., Koenig, M.: Effect of phoneme characteristics on TEO Feature-based Automatic Stress Detection in Speech. In: ICASSP 2005. Proceedings of the 30th IEEE International Conference on Acoustics, Speech, and Signal Processing, Philadelphia, vol. 1, pp. 357–360 (2005)

    Google Scholar 

  8. Rajasekaran, P.K., Doddington, G.R., Picone, J.W.: Recognition of Speech under Stress and in Noise. In: ICASSP 1986. Proceedings of the 11th IEEE International Conference on Acoustics, Speech, and Signal Processing, Tokyo, pp. 733–736 (1986)

    Google Scholar 

  9. Cairns, D.A., Hansen, J.H.L.: Nonlinear Analysis and Detection of Speech under Stressed Conditions. Journal of the Acoustic Society of America 96(6), 3392–3400 (1994)

    Article  Google Scholar 

  10. Dharanipragada, S., Rao, B.D.: MVDR-based Feature Extraction for Robust Speech Recognition. In: ICASSP 2001. Proceedings of the 26th IEEE International Conference on Acoustics, Speech, and Signal Processing, Salt Lake City, pp. 309–312 (2001)

    Google Scholar 

  11. Whittmore, J., Fisher, S.: Speech during Sustained Operations. Speech Communications 20, 55–70 (1996)

    Article  Google Scholar 

  12. Clary, G., Hansen, J.H.L.: A Novel Speech Recognizer for Keyword Spotting. In: Proceedings of the International Conference of Spoken Language Processing (ICSLP ’02), Alberta, vol. 1, pp. 13–16 (1992)

    Google Scholar 

  13. Hansen, J.H.L., Bou-Ghazale, S.E.: Duration and Spectral Based Stress Token Generation for Keyword Recognition under Hidden Markov Models. IEEE Transactions on Speech & Audio Processing 3(5), 415–421 (1995)

    Article  Google Scholar 

  14. Junqua, J.C.: The Lombard Reflex and its Role on Human Listeners and Automatic Speech Recognition. Journal of the Acoustic Society of America 93(1), 510–524 (1993)

    Article  Google Scholar 

  15. Junqua, J.C.: The Influence of Acoustics on Speech Production: a Noise-Induced Stress Phenomenon known as the Lombard Effect. Speech Communication 20, 13–22 (1996)

    Article  Google Scholar 

  16. Hicks, J.W., Hollien, H.: The Reflection of Stress in Voice-1: Understanding the Basic Correlates. In: Proceedings of the 1991 Carnahan Conference on Crime Countermeasures, pp. 189–195 (1981)

    Google Scholar 

  17. Hansen, J.H.L., Swail, C., South, A.J., Moore, R.K., Steeneken, H., Cupples, E.J., Anderson, T., Vloeberghs, C.R.A., Trancoso, I., Verlinde, P.: The Impact of Speech Under ’Stress’ on Military Speech Technology. In: NATO RTO-TR-10, AC/323(IST)TP/5 IST/TG-01 (2000)

    Google Scholar 

  18. Murray, I.R., Baber, C., South, A.: Towards a Definition and Working Model of Stress and its Effects on Speech. Speech Communication 20, 3–12 (1996)

    Article  Google Scholar 

  19. Goldberger, L., Breznitz, S.: Handbook of Stress: Theoretical and Clinical Aspects. Free Press, MacMilliam Pub., New York (1982)

    Google Scholar 

  20. Schreuder, M.J.: Prosodic Processes in Language and Music. PhD thesis, University of Groningen (2006)

    Google Scholar 

  21. Hansen, J.H.L.: Evaluation of Acoustic Correlates of Speech Under Stress for Robust Speech Recognition. In: IEEE Proceedings of the 15th Northeast Bioengineering Conference, Boston, pp. 31–32 (1989)

    Google Scholar 

  22. Paul, D.B.: A Speaker-Stress Resistant HMM Isolated Word Recognizer. In: Proceedings of the 12th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP ’87), Dallas, pp. 713–716 (1987)

    Google Scholar 

  23. Pickett, J.M.: The Sound of Speech Communication. University Park Press, Baltimore (1980)

    Google Scholar 

  24. Williams, C.E., Stevens, K.N.: Emotions and Speech: Some Acoustic Correlates. Journal of the Acoustic Society of America 52(4), 1238–1250 (1972)

    Article  Google Scholar 

  25. Hansen, J.H.L.: Analysis and Compensation of Speech under Stress and Noise for Environmental Robustness in Speech Recognition. Speech Communications, Special Issue on Speech Under Stress 20(2), 151–170 (1996)

    Google Scholar 

  26. Van Santen, J.: Prosodic modeling in Text-to-Speech Synthesis. In: Proceedings of the European Conference on Speech Communication and Technology (Eurospeech ’97), Rhodes, Greece, pp. 19–28 (1997)

    Google Scholar 

  27. Hansen, J.H.L.: Adaptive Source Generator Compensation and Enhancement for Speech Recognition in Noisy Stressful Environments. In: Proceedings of the 18th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP ’93), Minn., pp. 95–98 (1993)

    Google Scholar 

  28. Hecker, M.H.L., Stevens, K.N., von Bismark, G., Williams, C.E.: Manifestations of Task Induced Stress in the Acoustic Speech Signal. Journal of the Acoustic Society of America 44, 993–1001 (1968)

    Article  Google Scholar 

  29. Hansen, J.H.L., Cairns, D.A.: ICARUS: Source Generator based Real-Time Recognition of Speech in Noisy Stressful and Lombard Effect Environments. Speech Communications 16(4), 391–422 (1995)

    Article  Google Scholar 

  30. Hansen, J.H.L., Womack, B.: Feature Analysis and Neural Network based Classification of Speech under Stress. IEEE Transactions on Speech & Audio Processing 4(4), 307–313 (1996)

    Article  Google Scholar 

  31. Womack, B.D., Hansen, J.H.L.: Classification of Speech Under Stress using Target Driven Features. Speech Communication, Special Issue on Speech Under Stress 20(1), 131–150 (1996)

    Google Scholar 

  32. Bou-Ghazale, S.E., Hansen, J.H.L.: Stressed Speech Synthesis Based on a Modified CELP Vocoder Framework. Speech Communications: Special Issue on Speech Under Stress 20(2), 93–110 (1996)

    Google Scholar 

  33. Hansen, J.H.L.: Morphological Constrained Enhancement with Adaptive Cepstral Compensation (MCE-ACC) for Speech Recognition in Noise and Lombard Effect. IEEE Transactions on Speech & Audio Proc (SPECIAL ISSUE: Robust Speech Recognition) 2(4), 598–614 (1994)

    Google Scholar 

  34. Hansen, J.H.L., Bria, O.N.: Lombard Effect Compensation for Robust Automatic Speech Recognition in Noise. In: Proceedings of the International Conference on Spoken Language Processing (ICLSP ’90), Kobe, Japan, pp. 1125–1128 (1990)

    Google Scholar 

  35. Yapanel, U.H., Hansen, J.H.L.: A New Perspective on Feature Extraction for Robust In-Vehicle Speech Recognition. In: Proceedings of the 8th European Conference on Speech Communication and Technology (Eurospeech ’03), Geneva, Switzerland, pp. 1281–1284 (2003)

    Google Scholar 

  36. Bou-Ghazale, S.E., Hansen, J.H.L.: A Comparative Study of Traditional and Newly Proposed Features for Recognition of Speech Under Stress. IEEE Transactions on Speech & Audio Processing 8(4), 429–442 (2000)

    Article  Google Scholar 

  37. Hansen, J.H.L., Clements, M.A.: Constrained Iterative Speech Enhancement with Application to Speech Recognition. IEEE Transactions on Signal Processing 39(4), 795–805 (1991)

    Article  Google Scholar 

  38. Clary, G., Hansen, J.H.L.: Feature Enhancement for Multi-layer Perceptron and Semi-Continuous Hidden Markov Model Based Classifiers using Neural Networks. In: Neural and Stochastic Methods in Image and Signal Processing, Proceedings of the SPIE, vol. 1766, pp. 529–540 (1992)

    Google Scholar 

  39. Cestaro, V.L.: A Comparison between Decision Accuracy Rates obtained using the Polygraph Instrument and Computer Voice Stress Analyzer (CVSA) in the absence of Jeopardy. Technical report, DOD Polygraph Inst. (1995)

    Google Scholar 

  40. Eriksson, A., Drygajlo, A.: Forsensic Speech Science. In: Tutorial, 9th European Conference on Speech Communication and Technology (Interspeech 05 - Eurospeech) (2005)

    Google Scholar 

  41. Zhou, G.: Nonlinear Speech Analysis and Acoustic Model Adaptation with Applications to Stress Classification and Speech Recognition. PhD thesis, Dept. of Electrical and Computer Eng., Duke University (1999)

    Google Scholar 

  42. Zhou, G., Hansen, J.H.L., Kaiser, J.: Linear and Nonlinear Speech Feature Analysis for Stress Classification. In: Proceedings of the International Conference on Spoken Language Processing (ICLSP ’98), Sydney, Australia, vol. 3, pp. 883–886 (1998)

    Google Scholar 

  43. Zhou, G., Hansen, J.H.L., Kaiser, J.F.: Classification of Speech under Stress Based on Features Derived from the Nonlinear Teager Energy Operator. In: Proceedings of the 23th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP ’98), Seattle, pp. 549–552 (1998)

    Google Scholar 

  44. Womack, B.D., Hansen, J.H.L.: N-Channel Hidden Markov Models for Combined Stress Speech Classification and Recognition. IEEE Transactions on Speech and Audio Processing 7(6), 668–677 (1999)

    Article  Google Scholar 

  45. Kaiser, J.F.: Some Observations on Vocal Tract Operation from a Fluid Flow Point of View. In: Titze, I.R., Scherer, R.C. (eds.) Vocal Fold Physiology: Biomechanics, Acoustics, and Phonatory Control. Denver Center for the Performing Arts, Denver, pp. 358–386 (1983)

    Google Scholar 

  46. Teager, H.M.: Some Observations on Oral Air Flow during Phonation. IEEE Transactions Acoustic, Speech, Signal Processing 28(5), 599–601 (1980)

    Article  Google Scholar 

  47. Teager, H.M., Teager, S.M.: A Phenomenological Model for Vowel Production in the Vocal Tract. In: Speech Science: Recent Advances, pp. 72–100 (1982)

    Google Scholar 

  48. Teager, H.M., Teager, S.: Evidence for Nonlinear Production Mechanisms in the Vocal Tract. In: NATO Advanced Study Inst. On Speech Production and Speech Modeling, Bonas, France, vol. 55, pp. 241–261. Kluwer Academic Publishers, Boston (1989)

    Google Scholar 

  49. Thomas, T.J.: A Finite Element Model of Fluid Flow in the Vocal Tract. Computer Speech Language 1, 131–151 (1986)

    Article  Google Scholar 

  50. Hansen, J.H.L., Gavidia-Ceballos, L., Kaiser, J.F.: A Nonlinear based Speech Feature Analysis Method with Application to Vocal Fold Pathology Assessment. IEEE Transactions on Biomedical Engineering 45(3), 300–313 (1998)

    Article  Google Scholar 

  51. Zhou, G., Hansen, J.H.L., Kaiser, J.F.: Nonlinear Feature Based Classification of Speech under Stress. IEEE Transactions on Speech & Audio Processing 9, 201–216 (2001)

    Article  Google Scholar 

  52. Rahurkar, M., Hansen, J.H.L., Meyerhoff, J., Saviolakis, G., Koenig, M.: Frequency Band Analysis for Stress Detection Using a Teager Energy Operator Based Feature. In: Proceedings of the International Conference of Spoken Language Processing (ICSLP ’02), Denver, vol. 3, pp. 2021–2024 (2002)

    Google Scholar 

  53. Ruzanski, E., Hansen, J.H.L., Meyerhoff, J., et al.: Stress Level Classification of Speech using Euclidean Distance Metrics in a Novel Hybrid Multi-Dimensional Feature Space. In: Proceedings of the 31st IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP ’06), Toulouse, vol. 1, pp. I–425–I–428 (2006)

    Google Scholar 

  54. Bou-Ghazale, S.E.: Analysis, Modeling, and Perturbation of Speech Under Stress with Applications to Synthesis and Recognition. PhD thesis, Robust Speech Processing Laboratory, Duke Univ. Dept. of Electrical Engineering (1996)

    Google Scholar 

  55. Bou-Ghazale, S.E., Hansen, J.H.L.: Stress Perturbation of Neutral Speech for Synthesis based on Hidden Markov Models. IEEE Transactions on Speech & Audio Processing 6(3), 201–216 (1998)

    Article  Google Scholar 

  56. Cahn, J.: The Generation of Affect in Synthesized Speech. Journal of the American Voice I/O Society 8, 1–19 (1990)

    Google Scholar 

  57. Hansen, J.H.L., Clements, M.A.: Evaluation of Speech under Stress and Emotional Conditions. 82(S1), 7–8 (1987)

    Google Scholar 

  58. Murray, I.R., Arnott, J.L.: Implementation and Testing of a System for Producing Emotion-by-Rule in Synthetic Speech. Speech Communication 16, 369–390 (1995)

    Article  Google Scholar 

  59. Murray, I.R., Arnott, J.L.: Synthesizing Emotions in Speech: is it time to get excited? In: Proceedings of the 4th International Conference on Spoken Language Processing (ICLSP ’96), vol. 3, pp. 1816–1819. Philadelphia (1996)

    Google Scholar 

  60. Black, A.: Multilingual Speech Synthesis. In: Schultz, T., Kirchhoff, K. (eds.) Multilingual Speech Processing. Elsevier, Academic Press (2006)

    Google Scholar 

  61. Picard, R.W., Klein, J.: Computers that Recognize and Respond to User Emotion: Theoretical and Practical Implications. Interacting with Computers 14(2), 141–169 (2002)

    Article  Google Scholar 

  62. Sproat, R. (ed.): Multilingual Text-to-Speech Synthesis: The Bell Labs Approach. Kluwer Academic Publishers, Boston (1997)

    Google Scholar 

  63. Van Santen, J., Kain, A., Klabbers, E.: Synthesis by Recombination of Segmental and Prosodic information. In: Proceedings of the International Conference on Speech Prosody, Japan, pp. 409–412 (2004)

    Google Scholar 

  64. Bachrach, A.J.: Speech and its Potential for Stress Monitoring: Monitoring Vital Signs in the Divers. Technical report, Naval Medical Research Institute (1979)

    Google Scholar 

  65. Chen, Y.: Cepstral Domain Talker Stress Compensation for Robust Speech Recognition. IEEE Transactions on Acoustic Speech Signal Process. 36, 433–439 (1988)

    Article  MATH  Google Scholar 

  66. Darby, J.K.: Speech Evaluation in Psychiatry. Grune and Stratton, New York (1981)

    Google Scholar 

  67. Flack, M.: Flying Stress. Medical Research Committee, London (1918)

    Google Scholar 

  68. Hansen, J.H.L.: Analysis and Compensation of Noisy Stressful Speech for Environmental Robustness in Speech Recognition (invited tutorial). In: NATO-ESCA Proc. Inter. Tutorial & Research Workshop on Speech Under Stress, Lisbon, Portugal, pp. 91–98 (1995)

    Google Scholar 

  69. Hansen, J.H.L., Bou-Ghazale, S.E.: Getting Started with SUSAS: A Speech Under Simulated and Actual Stress Database. In: Proceedings of the European Conference on Speech Communication and Technology (Eurospeech ’97), vol. 4, pp. 1743–1746. Rhodes, Greece (1997)

    Google Scholar 

  70. Hansen, J.H.L., Mammone, R., Young, S.: Editorial for the special issue: Robust Speech Recognition. IEEE transactions on Speech & Audio Processing 2(4), 549–550 (1994)

    Google Scholar 

  71. Hansen, J.H.L., Gavidia-Ceballos, L., Kaiser, J.F.: A Nonlinear based Speech Feature Analysis Method with Application to Vocal Fold Pathology Assessment. IEEE Transactions on Biomedical Engineering 45(3), 300–313 (1998)

    Article  Google Scholar 

  72. Hollien, H., Hicks, J.W.: The Reflection of Stress in Voice-2: the Special Case of Psychological Stress Evaluators. In: Proceedings of the 1991 Carnahan Conference on Crime Countermeasures, pp. 196–197 (1991)

    Google Scholar 

  73. House, A.S.: On Vowel Duration in English. Journal of the Acoustic Society of America 33(9), 1174–1178 (1962)

    Article  Google Scholar 

  74. Kuroda, I., Fujiwara, O., Okamura, N., Utsuki, N.: Method for Determining Pilot Stress Through Analysis of Voice Communications. In: Aviation, Space, and Environmental Medicine 528–533 (1976)

    Google Scholar 

  75. Kaiser, J.F.: Some Useful Properties of Teager’s Energy operator. In: Proceedings of the 18th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP ’93), Minn., vol. 3, pp. 149–152 (1993)

    Google Scholar 

  76. Kaiser, J.F.: On a Simple Algorithm to Calculate the Energy of a Signal. In: Proceedings of the 15th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP ’90), Albuquerque, New Mexico, pp. 381–384 (1990)

    Google Scholar 

  77. McGilloway, S., Cowie, R., Douglas-Cowie, E., Gielen, S., Westerdijk, M., Stroeve, S.: Approaching Automatic Recognition of Emotion from Voice: A rough Benchmark. In: Proceedings of the ISCA Workshop on Speech and Emotion, Belfast (2000)

    Google Scholar 

  78. Malkin, F.J., Christ, K.A.: Human Factors Engineering Assessment of Voice Technology for the Light Helicopter Family. Technical Report I-20, U. S. Armu Human Engineering Lab. (June 1985)

    Google Scholar 

  79. Maragos, P., Kaiser, J.F., Quatieri, T.F.: On Amplitude and Frequency Demodulation using Energy Operators. IEEE Transactions on Signal Processing 41, 1532–1550 (1993)

    Article  MATH  Google Scholar 

  80. Poock, G.K., Armstrong, J.W.: Effect of Operator Mental Loading on Voice Recognition System Performance. Technical report, Naval Postgraduate School (1981)

    Google Scholar 

  81. Poock, G.K., Armstrong, J.W.: Effect of Task Duration on Voice Recognition System Performance. Technical report, Naval Postgraduate School (September 1981)

    Google Scholar 

  82. Schreuder, M., Eerten, L.v., Gilbers, D.: Music as a Method of Identifying Emotional Speech. In: Proceedings of the Workshop on Corpora for Research on Emotion and Affect (LRE ’06), Genua, Italy, pp. 55–59 (2006)

    Google Scholar 

  83. Simonov, P.V., Frolov, M.V.: Analysis of the Human Voice as a Method of Controlling Emotional State: Achievements and Goals. Aviation, Space, and Environmental Sciences, pp. 23–25 (1977)

    Google Scholar 

  84. Streeter, L.A., MacDonald, N.H., Apple, W., Krauss, R.M., Galotti, K.M.: Acoustic and Perceptual Indicators of Emotional Stress. Journal of the Acoustic Society of America 73(3), 917–928 (1988)

    Google Scholar 

  85. Varadarajan, V., Hansen, J.H.L., Ikeno, A.: UT-SCOPE - A corpus for Speech under Cognitive/Physical Task Stress and Emotion. In: LREC 2006. Workshop on Corpora for Research on Emotion and Affect, pp. 72–75 (2006)

    Google Scholar 

  86. Varadarajan, V., Hansen, J.H.L.: Analysis of Lombard effect under Different types and levels of Noise with Application to In-set Speaker ID systems. In: Interspeech 2006 –ICSLP. Proceedings of the 9th International Conference on Spoken Language Processing, Pittsburgh (2006)

    Google Scholar 

  87. Womack, B., Hansen, J.H.L.: Robust Speech Recognition via Speaker Stress Classification. In: ICASSP 2006. Proceedings of the 31th IEEE International Conference on Acoustics, Speech, and Signal Processing, Toulouse, vol. 1, pp. 53–56 (2006)

    Google Scholar 

  88. Yamada, T., Hashimoto, H., Tosa, N.: Pattern Recognition of Emotion with Neutral Network. In: IECON 1995. Proc. 21st Inter. Conf. on Industrial Electronics, Control, and Instrumentation, vol. 1, pp. 183–187 (1995)

    Google Scholar 

  89. Yapanel, U.H., Dharanipragada, S.: Perceptual MVDR-based Cepstral Coefficients for Noise Robust Speech Recognition. In: ICASSP 2003. Proceedings of the 28th IEEE International Conference on Acoustics, Speech, and Signal Processing, Hong-Kong (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Christian Müller

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Hansen, J.H.L., Patil, S. (2007). Speech Under Stress: Analysis, Modeling and Recognition. In: Müller, C. (eds) Speaker Classification I. Lecture Notes in Computer Science(), vol 4343. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74200-5_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-74200-5_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-74186-2

  • Online ISBN: 978-3-540-74200-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics