Identifying Issues in Estimating Parameters from Speech Under Lombard Effect

Aiswarya, M.; Pravena, D.; Govind, D.

doi:10.1007/978-3-319-67934-1_22

M. Aiswarya²⁰,
D. Pravena²⁰ &
D. Govind²⁰

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 678))

Included in the following conference series:

International Symposium on Signal Processing and Intelligent Recognition Systems

1606 Accesses

Abstract

Lombard effect (LE) is the phenomena in which a person tends to speak louder in the presence of loud noise, due to the obstruction of self-auditory feedback. The main objective of this work is to develop a dataset for the study of LE on speech parameters. The proposed dataset comprising of 230 utterances each from 10 speakers, consists of the simultaneous recording of speech and ElectroGlottoGram (EGG) of speech under LE as well as neutral speech recorded in a noise free condition. The speech under LE is recorded at 5 different levels (30 dB, 15 dB, 5 dB, 0 dB and \(-20\) dB) of babble noise. The level of LE in the developed dataset is demonstrated by comparing (a) the source parameters, (b) speaker recognition rates and (c) epoch extraction performance. For the comparison of source parameters like pitch and Strength of Excitation (SoE), the neutral speech and speech under LE are compared. Based on the comparison, high pitch and low SoE are observed for the speech under LE. Also, lower recognition performance is observed when a Mel Frequency Cepstral Coefficient (MFCC) - Gaussian Mixture Model (GMM) based speaker recognition system built using the neutral speech, is tested with the speech under LE obtained from the same set of speakers. Finally, on the basis of the comparison of epoch extraction from neutral speech and speech under LE, the utterances with LE is observed to have higher epoch deviation than that for neutral speech. All these experiments confirm the level of LE in the prepared database and also reinforces the issues in processing the speech under LE, for different speech processing tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bapineedu, G., Avinash, B., Gangashetty, S.V., Yegnanarayana, B.: Analysis of lombard speech using excitation source information. In: Interspeech, pp. 1091–1094. Citeseer (2009)
Google Scholar
Mahadeva Prasanna, S.R., Govind, D.: Analysis of excitation source information in emotional speech. In: INTERSPEECH, pp. 781–784 (2010)
Google Scholar
Raja, G.S., Dandapat, S.: Speaker recognition under stressed condition. Int. J. Speech Technol. 13(3), 141–161 (2010)
Article Google Scholar
Hansen, J.H.L.: Analysis and compensation of speech under stress and noise for environmental robustness in speech recognition. Speech Commun. 20(1–2), 151–173 (1996)
Article Google Scholar
Furui, S.: 50 years of progress in speech and speaker recognition. In: SPECOM 2005, Patras, pp. 1–9 (2005)
Google Scholar
Bapineedu, G.: Analysis of Lombard effect speech and its application in speaker verification for imposter detection. Ph.D. thesis, International Institute of Information Technology Hyderabad, India (2010)
Google Scholar
Hagiwara, R.: Monthly mystery spectrogram. Linguistics Department, University of Manitoba, Canada (2006)
Google Scholar
Ikeno, A., Varadarajan, V., Patil, S., Hansen, J.H.L.: Ut-scope: speech under Lombard effect and cognitive stress. In: Aerospace Conference, 2007 IEEE, pp. 1–7. IEEE (2007)
Google Scholar
Hansen, J.H.L., Bou-Ghazale, S.E., Sarikaya, R., Pellom, B.: Getting started with SUSAS: a speech under simulated and actual stress database. In: Eurospeech, vol. 97, pp. 1743–1746 (1997)
Google Scholar
Bořil, H., Pollák, P.: Design and collection of Czech Lombard speech database. In: Proceedings of Interspeech, vol. 5, pp. 1577–1580. Citeseer (2005)
Google Scholar
Pravena, D., Govind, D.: Development of simulated emotion speech database for excitation source analysis. Int. J. Speech Technol. 20, 327–338 (2017)
Article Google Scholar
Shukla, S., Prasanna, S.R.M., Dandapat, S.: Stressed speech processing: human vs automatic in non-professional speakers scenario. In: 2011 National Conference on Communications (NCC), pp. 1–5. IEEE (2011)
Google Scholar
Reynolds, D.A.: Speaker identification and verification using Gaussian mixture speaker models. Speech Commun. 17(1), 91–108 (1995)
Article Google Scholar
Pravena, D., Nandhakumar, S., Govind, D.: Significance of natural elicitation in developing simulated full blown speech emotion databases. In: 2016 IEEE Students on Technology Symposium (TechSym), pp. 261–265. IEEE (2016)
Google Scholar
Govind, D., Mahadeva Prasanna, S.R., Pati, D.: Epoch extraction in high pass filtered speech using Hilbert envelope. In: INTERSPEECH, pp. 1977–1980 (2011)
Google Scholar
Deepak, K.T., Prasanna, S.R.M.: Epoch extraction using zero band filtering from speech signal. Circ. Syst. Sig. Process. 34(7), 2309–2333 (2015)
Article Google Scholar
Ramesh, K., Mahadeva Prasanna, S.R., Govind, D.: Detection of glottal opening instants using Hilbert envelope. In: Interspeech, pp. 44–48 (2013)
Google Scholar
Govind, D., Hisham, P.M., Pravena, D.: Effectiveness of polarity detection for improved epoch extraction from speech. In: 2016 Twenty Second National Conference on Communication (NCC), pp. 1–6. IEEE (2016)
Google Scholar
Govind, D., Joy, T.T.: Improving the flexibility of dynamic prosody modification using instants of significant excitation. Circ. Syst. Signal Process. 35(7), 2518–2543 (2016)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Centre for Computational Engineering and Networking (CEN), Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore, 641112, Tamilnadu, India
M. Aiswarya, D. Pravena & D. Govind

Authors

M. Aiswarya
View author publications
You can also search for this author in PubMed Google Scholar
D. Pravena
View author publications
You can also search for this author in PubMed Google Scholar
D. Govind
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to M. Aiswarya .

Editor information

Editors and Affiliations

School of CS/IT, Indian Institute of Information Technology and Management, Trivandrum, Kerala, India
Sabu M. Thampi
Department of Electrical and Computer Engineering, Ryerson University, Toronto, Ontario, Canada
Sri Krishnan
Department of Computer Science, University of Salamanca, Salamanca, Salamanca, Spain
Juan Manuel Corchado Rodriguez
Electronics and Communication Sciences Unit, Indian Statistical Institute, Kolkata, West Bengal, India
Swagatam Das
Department of Systems and Computer Networks, Wroclaw University of Science and Technology, Wroclaw, Poland
Michal Wozniak
Faculty of Engineering and Technology, Liverpool John Moores University, Liverpool, United Kingdom
Dhiya Al-Jumeily

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Aiswarya, M., Pravena, D., Govind, D. (2018). Identifying Issues in Estimating Parameters from Speech Under Lombard Effect. In: Thampi, S., Krishnan, S., Corchado Rodriguez, J., Das, S., Wozniak, M., Al-Jumeily, D. (eds) Advances in Signal Processing and Intelligent Recognition Systems. SIRS 2017. Advances in Intelligent Systems and Computing, vol 678. Springer, Cham. https://doi.org/10.1007/978-3-319-67934-1_22

Download citation

DOI: https://doi.org/10.1007/978-3-319-67934-1_22
Published: 27 September 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67933-4
Online ISBN: 978-3-319-67934-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics