Effect of Gender and Sound Spatialization on Speech Intelligibility in Multiple Speaker Environment

  • M. Joshi
  • M. Iyer
  • N. Gupta
  • A. Barreto
Conference paper


In multiple speaker environments such as teleconferences we observe a loss of intelligibility, particularly if the sound is monaural in nature. In this study, we exploit the "Cocktail Party Effect", where a person can isolate one sound above all others using sound localization and gender cues. To improve clarity of speech, each speaker is assigned a direction using Head Related Transfer Functions (HRTFs) which creates an auditory map of multiple conversations. A mixture of male and female voices is used to improve comprehension.

We see 6% improvement in cognition while using a male voice in a female dominated environment and 16% improvement in the reverse case. An improvement of 41% is observed while using sound localization with varying elevations. Finally, the improvement in cognition jumps to 71% when both elevations and azimuths are varied. Compared to our previous study, where only azimuths were used, we observe that combining both the azimuths and elevations gives us better results (57% vs. 71%).


Impulse Response Listening Test Interaural Time Difference Speech Intelligibility Female Voice 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    M. Joshi, K. Kotakonda, N. Gupta, and A. Barreto, “Improving Intelligibilty of Teleconferences Using Binaural Sounds”, REV 2009.Google Scholar
  2. [2]
    K. J. Faller II, A. Barreto, N. Gupta and N. Rishe, “Performance Comparison of Two Identification Methods for Analysis of Head Related Impulse Responses”, CISSE 2005.Google Scholar
  3. [3]
    Wenzel,E.M.,Arruda,M.,Kistler,D.J.and Wightman, F.L., “Localization Using Non-individualized Head-Related Transfer Functions,” J.Acoust.Soc.Amer.,Vol.94,111-123, 1993.CrossRefGoogle Scholar
  4. [4]
    Begault, D. R., “A head-up auditory display for TCAS advisories.” Human Factors, 35, 707-717, 1993.Google Scholar
  5. [5]
    AuSIM, Inc., “HeadZap: AuSIM3D HRTF Measurement System Manual”. AuSIM, Inc., 4962 El Camino Real, Suite 101, Los Altos, CA 94022, 2000.Google Scholar
  6. [6]
    N. Gupta, A. Barreto and C. Ordonez,“Spectral Modification of Head-Related Transfer Functions for Improved Virtual Sound Spatialization", Proceedings of the 2002 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2002), May 13-17, 2002.Google Scholar
  7. [7]
  8. [8]
    Cherry, E. C., “Some experiments on the recognition of speech, with one and with two ears.” Journal of Acoustical Society of America 25(5), 975—979, 1953.CrossRefGoogle Scholar
  9. [9]
  10. [10]

Copyright information

© Springer Science+Business Media B.V. 2010

Authors and Affiliations

  • M. Joshi
    • 1
  • M. Iyer
    • 1
  • N. Gupta
    • 1
  • A. Barreto
    • 2
  1. 1.University of BridgeportBridgeportU.S.A
  2. 2.Florida International UniversityMiamiU.S.A

Personalised recommendations