Speech Under Stress and Lombard Effect: Impact and Solutions for Forensic Speaker Recognition

Chapter

Abstract

In the field of voice forensics, the ability to perform effective speaker recognition from input audio streams is an important task. However, in many situations, individuals willchange the manner in which they produce their speech due to the environment (i.e., Lombard Effect), their speaker state (i.e., emotion, cognitive stress), and secondary tasks (i.e., task stress at hand, both physical and/or cognitive). Automatic recognition schemes for both speech and speaker ID are impacted by the variability introduced in these conditions. Extensive research in the field of speech under stress has been performed for speech recognition, primarily for low-vocabulary isolated-word recognition. However, limited formal research has been performed for speaker ID/verification primarily due to the lack of effective corpora in the field. This chapter addresses speech under stress including Lombard effect for the purposes of speaker recognition. Domains where stress/variability occur (Lombard Effect, Physical Stress, Cognitive Stress) will first be considered. Next, to perform effective speaker recognition it is necessary to detect if a subject is under stress, which is a useful trait in and of itself for voice forensics and biometrics, and therefore we consider prior research on the detection of speech under stress. Next, the impact of stress on speaker recognition is considered, and finally we address ways to improve speaker recognition in these domains (TEO features, alternative sensors, classification schemes, etc.). While speech under stress has been considered, the domain of speaker recognition represents an emerging research aspect which deserves further investigations.

References

  1. 1.
    NIST SRE USA National Institute of Standards and Technology (NIST) speaker recognition evaluation. http://www.itl.nist.gov/iad/mig/tests/sre/. Accessed 25 Jan 2011
  2. 2.
    Hansen JHL (1988) Analysis and compensation of stressed and noisy speech with application to robust automatic recognition. Ph.D. Thesis, School of Electrical and Computer Engineering, Georgia Institute of Technology, p 396Google Scholar
  3. 3.
    Hansen JHL, Clements M (1987) Evaluation of speech under stress and emotional conditions. J Acoust Soc Am 82(s1):S17–S18CrossRefGoogle Scholar
  4. 4.
    Hansen JHL (1989) Evaluation of acoustic correlates of speech under stress for robust speech recognition. IEEE proceedings of the fifteenth annual northeast bioengineering conference, (invited paper), March 1989. Boston, Mass, pp 31–32Google Scholar
  5. 5.
    Hansen JHL, Clements M (1989) Stress compensation and noise reduction algorithms for robust speech recognition. IEEE proceedings international conference on acoustics, speech, and signal processing, May 1989. Glasgow, Scotland, pp 266–269Google Scholar
  6. 6.
    Hansen JHL SUSAS: speech under simulated and actual stress corpus. U.S. Linguistics Data Consortium (LDC). http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC99S78
  7. 7.
    Hansen JHL SUSAS transcripts: speech under simulated and actual stress transcripts. U.S. Linguistics Data Consortium (LDC). http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC99T33
  8. 8.
    Hansen JHL, Bou-Ghazale S (1997) Getting started with SUSAS: a speech under simulated and actual stress database, vol 4, Sept 1997. EUROSPEECH-97, Rhodes, pp 1743–1746Google Scholar
  9. 9.
    Hansen JHL, Swail C, South AJ, Moore RK, Steeneken H, Cupples EJ, Anderson T, Vloeberghs CRA, Trancoso I, Verlinde P (2000) The impact of speech under ‘stress’ on military speech technology. NATO Research and Technology Organization RTO-TR-10, AC/323(IST)TP/5 IST/TG-01, March 2000 (ISBN: 92-837-1027-4)Google Scholar
  10. 10.
    Engbert IS, Hansen AV, (2007) Documentation of the Danish emotional speech database DES. Center for PersonKommunikation, Aalborg University, Denmark, Tech. Rep.Google Scholar
  11. 11.
    Burkhardt F, Paeschke A, Rolfes M, Sendlmeier W, Weiss B (2005) A Database of German Emotional Speech. ISCA Interspeech-05, Lisbon, pp 1517–1520Google Scholar
  12. 12.
    Ikeno A, Varadarajan V, Patil S, Hansen JHL (2007) UT-Scope: speech under lombard effect and cognitive stress. IEEE Aerospace Conference, March 2007, Big Sky, Montana, pp 1–7, 3–10Google Scholar
  13. 13.
    Douglas-Cowie E, Cowie R, Sneddon I, Cox C, Lowry O, McRorie M, Martin JC, Devillers L, Abrilian S, Batliner A (2007) The HUMAINE database: addressing the collection and annotation of naturalistic and induced emotional data. Affective Computing and Intelligent Interaction, pp 488-500Google Scholar
  14. 14.
    Steidl S (2009) Automatic classification of emotion-related user states in spontaneous children’s speech. Logos, BerlinGoogle Scholar
  15. 15.
    Angkititrakul P, Hansen JHL (2008) UTDrive: the smart vehicle project. In-vehicle corpus and signal processing for driver behavior. Springer (Chapter 5)Google Scholar
  16. 16.
    Angkititrakul P, Petracca M, Sathyanarayana A, Hansen JHL (2007) UTDrive: driver behavior and speech interactive systems for in-vehicle environments. IEEE Intelligent Vehicle Symposium, 13–15 June 2007, IstanbulGoogle Scholar
  17. 17.
    Angkititrakul P, Hansen JHL (2007) Getting start with UTDrive: driver-behavior modeling and assessment of distraction for in-vehicle speech systems. ISCA INTERSPEECH-2007, Aug 2007, Antwerp, pp 1334–1337Google Scholar
  18. 18.
    Steininger S, Schiel F, Dioubina O, Raubold S (2002) Development of user-state conventions for the multimodal corpus in SmartKom. Workshop on Multimodal Resources and Multimodal Systems Evaluation, Las Palmas, pp 33–37Google Scholar
  19. 19.
    Douglas-Cowie E, Campbell N, Cowie R, Roach P (2003) Emotional speech: towards a new generation of databases. Speech Commun 40(1–2):33–60CrossRefMATHGoogle Scholar
  20. 20.
    Fernandez R, Picard RW (2002) Modeling drivers’ speech under stress. ISCA Workshop (ITRW) on Speech and Emotion, BelfastGoogle Scholar
  21. 21.
    Hansen JHL, Varadarajan VS (2009) Analysis and normalization of Lombard speech under different types and levels of noise with application to in-set/out-of-set speaker recognition. IEEE Trans Audio Speech Lang Process 17(2):366–378CrossRefGoogle Scholar
  22. 22.
    Patil S, Sangwan A, Hansen JHL (2010) Speech under physical stress: a production-based framework. IEEE ICASSP-2010: International Conference Acoustics, Speech, and Signal Processing, Dallas, pp 5146–5149Google Scholar
  23. 23.
    Cairns D, Hansen JHL (1994) Nonlinear analysis and detection of speech under stressed conditions. J Acoust Soc Am 96(6):3392–3400CrossRefGoogle Scholar
  24. 24.
    Kaiser JF (1990) On a simple algorithm to calculate the ‘energy’ of a signal. IEEE ICASSP-1990, New Mexico, pp 381–384Google Scholar
  25. 25.
    Hansen JHL (1996) Analysis and compensation of speech under stress and noise for environmental robustness in speech recognition. Speech Commun 20(2):151–170CrossRefGoogle Scholar
  26. 26.
    Hansen JHL (1993) Adaptive source generator compensation and enhancement for speech recognition in noisy stressful environments, vol II. IEEE ICASSP-93, April 1993, Minneapolis, pp 95–98Google Scholar
  27. 27.
    Hansen JHL, Womack B, Arslan L (1994) A source generator based production model for environmental robustness in speech recognition, vol 3. ICSLP-94: international conference spoken language processing, Sept 1994, Yokohama, pp 1003–1006Google Scholar
  28. 28.
    Bou-Ghazale S, Hansen JHL (1995) Improving recognition and synthesis of stressed speech via feature perturbation in a source generator framework. NATO-ESCA international tutorial and research workshop on speech under stress, Sept 1995, Lisbon, pp 45–48Google Scholar
  29. 29.
    Bou-Ghazale S, Hansen JHL (1995) A source generator based modeling framework for synthesis of speech under stress, vol 1. IEEE ICASSP-95: international conference on acoustics, speech, and signal processing, May 1995, Detroit, pp 664–667Google Scholar
  30. 30.
    Hansen JHL, Cairns D (1995) ICARUS: a source generator based real-time system for speech recognition in noise, stress, and Lombard effect. Speech Commun 16(4):391–422CrossRefGoogle Scholar
  31. 31.
    Hansen JHL, Clements M (1995) Source generator equalization and enhancement of spectral properties for robust speech recognition in noise and stress. IEEE Trans Speech Audio Process 3(5):407–415CrossRefGoogle Scholar
  32. 32.
    Hansen JHL, Womack B (1996) Feature analysis and neural network based classification of speech under stress. IEEE Trans Speech Audio Process 4(4):307-313CrossRefGoogle Scholar
  33. 33.
    Zhou G, Hansen JHL, Kaiser JF (2001) Nonlinear feature based classification of speech under stress. IEEE Trans Speech Audio Process 9(2):201–216CrossRefGoogle Scholar
  34. 34.
    Rahurkar M, Hansen JHL, Meyerhoff J, Saviolakis G, Koenig M (2002) Frequency band analysis for stress detection using a Teager energy operator based feature. ISCA INTERSPEECH-02/ICSLP-02, Denver, pp 2021–2024Google Scholar
  35. 35.
    Hansen JHL, Kim W, Rahurkar M, Ruzanski E, Meyerhoff J (2011) Robust emotional stressed speech detection using weighted frequency subbands. EURASIP J Advan Signal Processing. doi:10.1155/2011/906789Google Scholar
  36. 36.
    Rahurkar MA, Hansen JHL, Meyerhoff J, Saviolakis G, Koenig M (2003) Frequency distribution based weighted sub-band approach for classification of emotional/stressful content in speech. ISCA INTERSPEECH-03, Sept 2003, Geneva, pp 721–724Google Scholar
  37. 37.
    Ruzanski E, Hansen JHL, Finan D, Meyerhoff J (2005) Improved ‘TEO’ feature-based automatic stress detection using physiological and acoustic speech sensors. ISCA INTERSPEECH-05, Sept 2005, Lisbon, pp 2653–2656Google Scholar
  38. 38.
    Patil S, Hansen JHL (2008) Detection of speech under physical stress: model development, sensor selection, and feature fusion. ISCA INTERSPEECH-08, Sept 2008, Brisbane, pp 817–820Google Scholar
  39. 39.
    Godin KW, Hansen JHL (2008) Analysis and perception of speech under physical task stress. ISCA INTERSPEECH-08, Sept 2008, Brisbane, pp 1674–1677Google Scholar
  40. 40.
    Boril H, Sadjadi O, Kleinschmidt T, Hansen JHL (2010) Analysis and detection of cognitive load and frustration in drivers’ speech. ISCA Interspeech-10, 26–30 Sept 2010, Makuhari, pp 502–505Google Scholar
  41. 41.
    Casale S, Russo A, Serrano S (2007) Multistyle classification of speech under stress using feature subset selection based on genetic algorithms. Speech Commun 49:801–810CrossRefGoogle Scholar
  42. 42.
    Karlsson I, Banziger T, Dankovicov J, Johnstone T, Lindberg J, Melin H, Nolan F, Scherer K (2000) Verification with elicited speaking styles in the VeriVox project. Speech Commun 31:121–129CrossRefGoogle Scholar
  43. 43.
    Lippmann R, Martin E, Paul D (1987) Multi-style training for robust isolated-word speech recognition. IEEE ICASSP-87, April 1987, pp 705–708Google Scholar
  44. 44.
    Chen Y (1988) Cepstral domain talker stress compensation for robust speech recognition. IEEE Trans Acoust Speech Signal Proc 36(4):433–439CrossRefMATHGoogle Scholar
  45. 45.
    Varadarajan VS, Hansen JHL (2006) Analysis of Lombard effect under different types and levels of background noise with application to in-set speaker ID systems. ISCA INTERSPEECH-06, Sept 2006, Pittsburgh, pp 937–940Google Scholar
  46. 46.
    Varadarajan VS, Hansen JHL,Ikeno A (2006) UT-Scope: a corpus for speech under cognitive/physical task stress and emotional. ELRA—LREC-2006: language resources and evaluation conference, May 22–26, 2006, GenoaGoogle Scholar
  47. 47.
    Ikeno A, Hansen JHL (2007) Lombard speech impact on perceptual speaker recognition. ISCA INTERSPEECH-07, Aug 2007, Antwerp, pp 414–417Google Scholar
  48. 48.
    Narayana ML, Kopparapu SK (2009) On the use of stress information in speech for speaker recognition. TENCON-2009, Jan 2009Google Scholar
  49. 49.
    Hansen JHL, Patil S (2007) Speech under stress: analysis, modeling and recognition. In: Müller C (ed) Speaker classification I: fundamentals, features, and methods. Springer, Berlin, pp 108–137CrossRefGoogle Scholar
  50. 50.
    Boril H (2008) Robust speech recognition: analysis and equalization of Lombard effect in Czech corpora. PhD Thesis, Czech Technical University, Prague, p 149Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  • John H. L. Hansen
    • 1
  • Abhijeet Sangwan
    • 1
  • Wooil Kim
    • 1
  1. 1.Department of Electrical Engineering, Center for Robust Speech Systems (CRSS), Erik Jonsson School of Engineering and Computer ScienceThe University of Texas at DallasRichardsonUSA

Personalised recommendations