DRT Evaluation of Localized Speech Intelligibility in Virtual 3-D Acoustic Space

  • Kazuhiro Kondo
Part of the Signals and Communication Technology book series (SCT)


In this chapter, we will describe the application of the proposed Japanese DRT to the measurement of intelligibility of localized Japanese speech in virtual three dimensional environment. Speech is localized to the specified location by convolving with the head related transfer function, which is the transfer function of the path between the source and each of the listener’s ear. We have been attempting to use speech localization to place many of the potential speech sources at distant locations within the virtual space to maintain the localized speech intelligibility, while preserving the “presence” of the other audio sources. The intended applications include multi-party audio conferencing, and augmented audio reality applications that mix localized speech on top of actual acoustic space.


Virtual Space Speech Intelligibility Real Source Target Speech Phonetic Feature 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Breebaart, J., Faller, C.: Spatial Audio Processing. Wiley, West Sussex (2007)CrossRefGoogle Scholar
  2. 2.
    Breebaart, J., van de Par, S., Kohlrahsch, A., Schuijers, E.: Parametric coding of stereo audio. EURASIP J. Appl. Sig. Process. 2005(9), 1305–1322 (2005)zbMATHCrossRefGoogle Scholar
  3. 3.
    Bronkhorst, A.W., Plomp, R.: Effect of multiple speechlike maskers on binaural speech recognition in normal and impaired hearing. J. Acoust. Soc. Am. 92(6), 3132–3139 (1992)CrossRefGoogle Scholar
  4. 4.
    Brungart, D., Simpson, B.: Distance-based speech segregation in near-field virtual audio displays. In: Proceedings of International Conference on Auditory Display, pp. 169–174 (2001)Google Scholar
  5. 5.
    Brungart, D., Simpson, B.: Optimizing the spatial configuration of a seven-talker speech display. In: Proceedings of International Conference on Auditory Display, pp. 188–191 (2003)Google Scholar
  6. 6.
    Chiba, T., Kitashima, Y., Yano, N., Kondo, K., Nakagawa, K.: On the influence of localized position of interference noise on the intelligibility of read Japanese words in remote conference systems. In: Proceedings of 37th International Congress and Exposition on Noise Control Engineering (Internoise), in08_0294. Shanghai, China (2008)Google Scholar
  7. 7.
    Fujimori, M., Kondo, K., Nakagawa, K.: On low frequency-pass characteristics of a one-out-of-two selection-based Japanese intelligibility test. In: Proceedings of 6th Technical Meeting of the Information Processing Society of the Japan Tohoku Chapter, vol. A2-2 (2005) (in Japanese)Google Scholar
  8. 8.
    Gardner, B., Martin, K.: HRTF measurement of a KEMAR dummy-head microphone (1994).
  9. 9.
    Hawley, M.L., Litovsky, R.Y., Colburn, H.S.: Speech intelligibility and localization in a multi-source environment. J. Acoust. Soc. Am. 105(6), 3436–3448 (1999)CrossRefGoogle Scholar
  10. 10.
    ISO/IEC JTC1/SC29: Coding of audio-visual objects—Part 3: Audio (2005)Google Scholar
  11. 11.
    Johnston, J.D., Ferreira, A.J.: Sum-difference stereo transform coding. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing, vol. 2, pp. 569–572. IEEE, San Francisco (1992)Google Scholar
  12. 12.
    Kanada, Y.: Simulated virtual market place by using voiscape communication medium. In: Proceedings of 13th Annual ACM International Conference on Multimedia, pp. 794–795, Singapore (2005)Google Scholar
  13. 13.
    Kilgore, R., Chignell, M.: Spatialized audioconferencing: What are the benefits? In: Proceedings of Conference of Centre for Advanced Studies Conference on Collaborative Research, pp. 111–120 (2003)Google Scholar
  14. 14.
    Kilgore, R., Chignell, M.: The Vocal village: enhancing collaboration with spatialized audio-conferencing. In: Proceedings of World Conference on E-Learning in Corporate, Government, Healthcare and Higher Education (ELEARN) (2004)Google Scholar
  15. 15.
    Kitashima, Y., Kondo, K., Nakagawa, K.: A speech communication environment using open source software library for active sound image control. J. Acoust. Soc. Am. 120(5), 3379 (2006)Google Scholar
  16. 16.
    Kitashima, Y., Kondo, K., Terada, H., Chiba, T., Nakagawa, K.: Intelligibility of read Japanese words with competing noise in virtual acoustic space. Acoust. Sci. Tech. 29(1), 74–81 (2008)CrossRefGoogle Scholar
  17. 17.
    Kondo, K., Izumi, R., Fujimori, M., Kaga, R., Nakagawa, K.: On a two-to-one selection based Japanese intelligibility test. J. Acoust. Soc. Jpn. 63(4), 196–205 (2007) (in Japanese)Google Scholar
  18. 18.
    Mesgarani, N., Grant, K.W., Duraiswami, R., Shamma, S.: Augmented intelligibility in simultaneous multi-talker environments. In: Proceedings of International Conference on Auditory Display, pp. 71–74 (2003)Google Scholar
  19. 19.
    Rice University: Signal Processing Information Base (SPIB).
  20. 20.
    Sugita, K., Yokota, M.: Practical research on a large scale video conference system. In: Proceedings of DICOMO, pp. 595–600 (2007)Google Scholar
  21. 21.
    Uchida, K., Nishino, T., Itou, K., Takeda, K., Itakura, F.: Evaluating the sound localization based on auditory masking. In: Technical Report of the IEICE, No. EA2003-121, pp. 15–20 (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  1. 1.Department of Electrical Engineering, Graduate School of Science and EngineeringYamagata UniversityYamagataJapan

Personalised recommendations