Abstract
In this paper, we investigated the influence of stereo coding on Japanese speech localized in virtual 3-D space. We encoded localized speech using joint stereo and parametric stereo modes within the HE-AAC encoder. First, we tested subjective quality of localized speech at various azimuths on the horizontal plane relative to the listener using the standard MUSHRA tests. We compared the encoded localized speech quality with various stereo encoding modes. The joint stereo mode showed significantly higher MUSHRA scores than the parametric stereo mode at azimuths of ±45 degrees. Next, the Japanese word intelligibility tests were conducted using the Japanese Diagnostic Rhyme Tests. Test speech was first localized at 0 and ±45 degrees and compared with localized speech with no coding. Parametric stereo-coded speech showed lower scores when localized at -45 degrees, but all other speech showed no difference between speech samples with no coding. Next, test speech was localized in front, while competing noise was localized at various angles. The two stereo coding modes with bit rates of 56, 32, and 24 kbps were tested. In most cases, these conditions show just as good intelligibility as speech with no encoding at all noise azimuths. This shows that stereo coding has almost no effect on the intelligibility in the bit rate range tested.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Kilgore, R., Chignell, M., Smith, P.: The Vocal Village: Enhancing Collaboration with Spatialized Audioconferencing. In: Proc. World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education (2004)
Kaneda, Y.: Subjective Evaluation of Voiscape - A Virtual “Sound Room” Based Communication-Medium. Tech. Rep. of the IEICE EA 2007-42 (2007)
Junichi, N., Kenji, O.: Effects of Reproduction Methods and High-Efficiency Audio Coding on Word Intelligibility with Competing Talkers. J. IEICE A J88-A, 1026–1034 (2005)
Kobayashi, Y., Kondo, K., Nakagawa, N., Takano, K.: The Influence of Stereo Coding on the 3D Sound Localization Accuracy. Tech. Rep. of the IEICE EA2008-56 (2008)
Alexander, R., Claudia, S.: Auditory Assessment of Conversational Speech Quality of Traditional and Spatialized Teleconferences. In: Proc. 8. ITG-Fachtagung Sprachkommunikation, pp. 8–10 (2008)
ISO/IEC 14496-3:2003/Amd.1
ISO/IEC 14496-3:2005/Amd.2
Breebaart, J., Par, S.v.d., Kohlrausch, A., Schuijers, E.: Parametric Coding of Stereo Audio. EURASIP J. on Applied Signal Processing 9, 1305–1322 (2004)
Kitashima, Y., Kondo, K., Terada, H., Chiba, T., Nakagawa, K.: Intelligibility of read Japanese words with competing noise in virtual acoustic space. J. Acoustical Science and Technology 29(1), 74–81 (2008)
HRTF Measurements of a KEMAR Dummy-Head Microphone, http://sound.media.mit.edu/resources/KEMAR.html
Recommendation ITU-R BS.1534-1: Method for the subjective assessment of intermediate quality level coding system (2001-2003)
ASJ Continuous Speech Database for Research (ASJ-JIPDEC), http://www.jipdec.jp/chosa/public/report/onseidb/
Coding Technologies Acquired by Dolby Laboratories, http://www.aacplus.net/
Kondo, K., Izumi, R., Fujimori, R., Rui, K., Nakahawa, K.: On a Two-to-One Selection Based Japanese Speech Intelligibility Test. J. Acoust. Soc. Jpn. 63, 196–205 (2007)
Chiba, T., Kitashima, Y., Yano, N., Kondo, K., Nakagawa, K.: On the influence of localized position of interference noise on the intelligibility of read Japanese words in remote conference systems, Inter-noise 2008, PO-2-0294 (2008)
Kobayashi, Y., Kondo, K., Nakagawa, K.: Intelligibility of Low Bit rate MPEG-coded Japanese Speech in Virtual 3D audio space. In: Proc. 15th International Conference on Auditory Display, pp. 99–102 (2009)
Yano, N., Kondo, K., Nakagawa, K., Takano, K.: The Effect of Localized Speech and Noise Distance on the Speech Intelligibility. IPSJ-Tohoku B-2-3 (2008)
Rice University: Signal Processing Information Base (SPIB), http://spib.rice.edu/spib/select_noise.html
Stoll, G., Kozamernik, F.: EBU Report on the Subjective Listening Tests of Some Commercial in EBU Technical Review, no. 283 (2000)
Kobayashi, Y., Kondo, K., Nakagawa, K.: Influence of Various Stereo Coding Modes on Encoded Japanese Speech Intelligibility with Competing Noise. In: Proc. International Work-shop on the Principles and Applications of Spatial Hearing, P19(Poster) (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kobayashi, Y., Kondo, K., Nakagawa, K. (2010). Intelligibility of HE-AAC Coded Japanese Words with Various Stereo Coding Modes in Virtual 3D Audio Space. In: Ystad, S., Aramaki, M., Kronland-Martinet, R., Jensen, K. (eds) Auditory Display. CMMR ICAD 2009 2009. Lecture Notes in Computer Science, vol 5954. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12439-6_12
Download citation
DOI: https://doi.org/10.1007/978-3-642-12439-6_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12438-9
Online ISBN: 978-3-642-12439-6
eBook Packages: Computer ScienceComputer Science (R0)