Advertisement

Identifying Issues in Estimating Parameters from Speech Under Lombard Effect

  • M. AiswaryaEmail author
  • D. Pravena
  • D. Govind
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 678)

Abstract

Lombard effect (LE) is the phenomena in which a person tends to speak louder in the presence of loud noise, due to the obstruction of self-auditory feedback. The main objective of this work is to develop a dataset for the study of LE on speech parameters. The proposed dataset comprising of 230 utterances each from 10 speakers, consists of the simultaneous recording of speech and ElectroGlottoGram (EGG) of speech under LE as well as neutral speech recorded in a noise free condition. The speech under LE is recorded at 5 different levels (30 dB, 15 dB, 5 dB, 0 dB and \(-20\) dB) of babble noise. The level of LE in the developed dataset is demonstrated by comparing (a) the source parameters, (b) speaker recognition rates and (c) epoch extraction performance. For the comparison of source parameters like pitch and Strength of Excitation (SoE), the neutral speech and speech under LE are compared. Based on the comparison, high pitch and low SoE are observed for the speech under LE. Also, lower recognition performance is observed when a Mel Frequency Cepstral Coefficient (MFCC) - Gaussian Mixture Model (GMM) based speaker recognition system built using the neutral speech, is tested with the speech under LE obtained from the same set of speakers. Finally, on the basis of the comparison of epoch extraction from neutral speech and speech under LE, the utterances with LE is observed to have higher epoch deviation than that for neutral speech. All these experiments confirm the level of LE in the prepared database and also reinforces the issues in processing the speech under LE, for different speech processing tasks.

References

  1. 1.
    Bapineedu, G., Avinash, B., Gangashetty, S.V., Yegnanarayana, B.: Analysis of lombard speech using excitation source information. In: Interspeech, pp. 1091–1094. Citeseer (2009)Google Scholar
  2. 2.
    Mahadeva Prasanna, S.R., Govind, D.: Analysis of excitation source information in emotional speech. In: INTERSPEECH, pp. 781–784 (2010)Google Scholar
  3. 3.
    Raja, G.S., Dandapat, S.: Speaker recognition under stressed condition. Int. J. Speech Technol. 13(3), 141–161 (2010)CrossRefGoogle Scholar
  4. 4.
    Hansen, J.H.L.: Analysis and compensation of speech under stress and noise for environmental robustness in speech recognition. Speech Commun. 20(1–2), 151–173 (1996)CrossRefGoogle Scholar
  5. 5.
    Furui, S.: 50 years of progress in speech and speaker recognition. In: SPECOM 2005, Patras, pp. 1–9 (2005)Google Scholar
  6. 6.
    Bapineedu, G.: Analysis of Lombard effect speech and its application in speaker verification for imposter detection. Ph.D. thesis, International Institute of Information Technology Hyderabad, India (2010)Google Scholar
  7. 7.
    Hagiwara, R.: Monthly mystery spectrogram. Linguistics Department, University of Manitoba, Canada (2006)Google Scholar
  8. 8.
    Ikeno, A., Varadarajan, V., Patil, S., Hansen, J.H.L.: Ut-scope: speech under Lombard effect and cognitive stress. In: Aerospace Conference, 2007 IEEE, pp. 1–7. IEEE (2007)Google Scholar
  9. 9.
    Hansen, J.H.L., Bou-Ghazale, S.E., Sarikaya, R., Pellom, B.: Getting started with SUSAS: a speech under simulated and actual stress database. In: Eurospeech, vol. 97, pp. 1743–1746 (1997)Google Scholar
  10. 10.
    Bořil, H., Pollák, P.: Design and collection of Czech Lombard speech database. In: Proceedings of Interspeech, vol. 5, pp. 1577–1580. Citeseer (2005)Google Scholar
  11. 11.
    Pravena, D., Govind, D.: Development of simulated emotion speech database for excitation source analysis. Int. J. Speech Technol. 20, 327–338 (2017)CrossRefGoogle Scholar
  12. 12.
    Shukla, S., Prasanna, S.R.M., Dandapat, S.: Stressed speech processing: human vs automatic in non-professional speakers scenario. In: 2011 National Conference on Communications (NCC), pp. 1–5. IEEE (2011)Google Scholar
  13. 13.
    Reynolds, D.A.: Speaker identification and verification using Gaussian mixture speaker models. Speech Commun. 17(1), 91–108 (1995)CrossRefGoogle Scholar
  14. 14.
    Pravena, D., Nandhakumar, S., Govind, D.: Significance of natural elicitation in developing simulated full blown speech emotion databases. In: 2016 IEEE Students on Technology Symposium (TechSym), pp. 261–265. IEEE (2016)Google Scholar
  15. 15.
    Govind, D., Mahadeva Prasanna, S.R., Pati, D.: Epoch extraction in high pass filtered speech using Hilbert envelope. In: INTERSPEECH, pp. 1977–1980 (2011)Google Scholar
  16. 16.
    Deepak, K.T., Prasanna, S.R.M.: Epoch extraction using zero band filtering from speech signal. Circ. Syst. Sig. Process. 34(7), 2309–2333 (2015)CrossRefGoogle Scholar
  17. 17.
    Ramesh, K., Mahadeva Prasanna, S.R., Govind, D.: Detection of glottal opening instants using Hilbert envelope. In: Interspeech, pp. 44–48 (2013)Google Scholar
  18. 18.
    Govind, D., Hisham, P.M., Pravena, D.: Effectiveness of polarity detection for improved epoch extraction from speech. In: 2016 Twenty Second National Conference on Communication (NCC), pp. 1–6. IEEE (2016)Google Scholar
  19. 19.
    Govind, D., Joy, T.T.: Improving the flexibility of dynamic prosody modification using instants of significant excitation. Circ. Syst. Signal Process. 35(7), 2518–2543 (2016)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  1. 1.Centre for Computational Engineering and Networking (CEN), Amrita School of EngineeringAmrita Vishwa VidyapeethamCoimbatoreIndia

Personalised recommendations