Multichannel Audio Coding for Multimedia Services in Intelligent Environments

Mouchtaris, Athanasios; Tsakalides, Panagiotis

doi:10.1007/978-3-540-78502-6_5

Multichannel Audio Coding for Multimedia Services in Intelligent Environments

Athanasios Mouchtaris^4,5 &
Panagiotis Tsakalides^4,5

Chapter

508 Accesses
1 Citations

Part of the book series: Studies in Computational Intelligence ((SCI,volume 120))

Summary

Audio is an integral component of multimedia services in intelligent environments. Use of multiple channels in audio capturing and rendering offers the advantage of recreating arbitrary acoustic environments, immersing the listener into the acoustic scene. On the other hand, multichannel audio contains a large degree of information which is highly demanding to transmit, especially for real-time applications. For this reason, a variety of compression methods have been developed for multichannel audio content. In this chapter, we initially describe the currently popular methods for multichannel audio compression. Low-bitrate encoding methods for multichannel audio have also been recently starting to attract interest, mostly towards extending MP3 audio coding to multichannel audio recordings, and these methods are also examined here. For synthesizing a truly immersive intelligent audio environment, interactivity between the user(s) and the audio environment is essential. Towards this goal, we present recently proposed multichannel-audio-specific models, namely the source/filter and the sinusoidal models, which allow for flexible manipulation and high-quality low-bitrate encoding, tailored for applications such as remote mixing and distributed immersive performances.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

ITU-R BS.1116, “Methods for the subjective assessment of small impairments in audio systems including multichannel sound systems,” 1994. International Telecommunications Union, Geneva, Switzerland.
Google Scholar
ISO/IEC JTC1/SC29/WG11 (MPEG) International Standard ISO/IEC 11172-3, “Coding of moving pictures and associated audio for digital storage media at up to about 1.5 Mbit/s,” 1992.
Google Scholar
D. Pan,“A tutorial on MPEG/Audio compression,” IEEE Multimedia, pp. 60-74, 1995.
Google Scholar
P. Noll,“MPEG digital audio coding,” IEEE Signal Processing Magazine, pp. 59-81, September 1997.
Google Scholar
K. Brandenburg, “MP3 and AAC explained,” in Proceedings of the 17th Inter-national Conference on High Quality Audio Coding of the Audio Engineering Society (AES), September 1999.
Google Scholar
T. Painter and A. Spanias, “Perceptual coding of digital audio,” Proceedings IEEE, vol. 88, pp. 100-120, April 2000.
Article Google Scholar
H. S. Malvar, “Lapped transforms for efficient transform/subband coding,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol.38, pp. 969-978, June 1990.
Article Google Scholar
ISO/IEC JTC1/SC29/WG11(MPEG) International Standard ISO/IEC 13818-3, “Generic coding of moving pictures and associated audio: Audio,” 1994.
Google Scholar
ISO/IEC JTC1/SC29/WG11(MPEG) International Standard ISO/IEC 13818-7, “Generic coding of moving pictures and associated audio: Advanced audio coding,” 1997.
Google Scholar
M. Bosi, K. Brandenburg, S. Quackenbush, L. Fielder, K. Akagiri, H. Fuchs, M. Dietz, J. Herre, G. Davidson, and Y. Oikawa, “ISO/IEC MPEG-2 advanced audio coding,” in Proceedings of the 101st Convention of the Audio Engineering Society (AES), preprint No. 4382, (Los Angeles, CA), November 1996.
Google Scholar
K. Brandenburg and M. Bosi, “ISO/IEC MPEG-2 Advanced Audio Coding: Overview and applications,” in Proceedings of the 103rd Convention of the Audio Engineering Society (AES), preprint No. 4641, 1997.
Google Scholar
J. D. Johnston and A. J. Ferreira, “Sum-difference stereo transform coding,” in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 569-572, 1992.
Google Scholar
B. C. J. Moore, An Introduction in the Psychology of Hearing. Academic Press, London, 1989.
Google Scholar
S. Haykin, Adaptive Filter Theory. Prentice Hall, Englewood Cliffs, NJ, 1996.
Google Scholar
J. Herre, K. Brandenburg, and D. Lederer, “Intensity stereo coding,” in Proceed-ings of the 96th Convention of the Audio Engineering Society (AES), preprint No. 3799, February 1994.
Google Scholar
R. Dressler, “Dolby Surround Pro Logic decoder principles of operation.” www.dolby.com.
“Dolby Surround Pro Logic II decoder principles of operation.” http://www. dolby.com.
D. Yang, H. Ai, C. Kyriakakis, and C. J. Kuo, “High-fidelity multichannel au-dio coding with karhunen-loeve transform,” IEEE Transactions on Speech and Audio Processing, vol. 11, pp. 365-380, July 2003.
Article Google Scholar
R. Irwan and R. M. Aarts, “Two-to-five channel sound processing,” Journal of the Audio Engineering Society, vol. 50, pp. 914-926, November 2002.
Google Scholar
J. Breebaart, J. Herre, C. Faller, J. Roden, F. Myburg, S. Disch, H. Purnhagen, G. Hotho, M. Neusinger, K. Kjorling, and W. Oomen, “MPEG Spatial Audio Coding/MPEG Surround: overview and current status,” in Proceedings of AES 119th Convention, Paper 6599 (New York, NY), October 2005.
Google Scholar
F. Baumgarte and C. Faller, “Binaural cue coding - Part I: Psychoacoustic fundamentals and design principles,” IEEE Transactions on Speech and Audio Processing, vol. 11, pp. 509-519, November 2003.
Article Google Scholar
C. Faller and F. Baumgarte, “Binaural cue coding - Part II: Schemes and applications,” IEEE Transactions on Speech and Audio Processing, vol. 11, pp. 520-531, November 2003.
Article Google Scholar
J. Breebaart, S. van de Par, A. Kohlrausch, and E. Schuijers, “Parametric cod-ing of stereo audio,” EURASIP Journal on Applied Signal Processing, vol. 9, pp. 1305-1322, 2005.
Google Scholar
A. Mouchtaris, K. Karadimou, and P. Tsakalides, “Multiresolution Source/ Filter Model for Low Bitrate Coding of Spot Microphone Signals,” EURASIP Journal on Audio, Speech, and Music Processing, vol. 2008, Article ID 624321, 16 pages, 2008. doi:10.1155/2008/624321.
Google Scholar
L. Rabiner and B.-H. Juang, Fundamentals of Speech Recognition. Prentice Hall, Englewoodcliffs, NJ, 1993.
Google Scholar
A. Sawchuk, E. Chew, R. Zimmermann, C. Papadopoulos, and C. Kyriakakis, “From remote media immersion to distributed immersive performance,” in Proceedings of ACM SIGMM Workshop on Experiential Telepresence (ETP), (Berkeley, CA), November 2003.
Google Scholar
N. Iwakami, T. Moriya, and S. Miki, “High-quality audio coding at less than 64 kbit/s by using transform-domain weighted interleave vector quantization (TWINVQ),” in IEEE International Conference on Acoustics, Speech, and Sig-nal Processing (ICASSP), pp. 3095-3098, May 1995.
Google Scholar
A. Mouchtaris, S. S. Narayanan, and C. Kyriakakis, “Virtual microphones for multichannel audio resynthesis,” EURASIP Journal on Applied Signal Processing, Special Issue on Digital Audio for Multimedia Communications, vol. 2003:10, pp. 968-979, 2003.
Google Scholar
C. Tzagkarakis, A. Mouchtaris, and P. Tsakalides, “Modeling Spot Microphone Signals using the Sinusoidal Plus Noise Approach,” in Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, NY, pp. 183-186, October 21-24, 2007.
Google Scholar
R. J. McAulay and T. F. Quatieri, “Speech analysis/synthesis based on a si-nusoidal representation,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 34(4), pp. 744-754, August 1986.
Article Google Scholar
Y. Stylianou, “Applying the harmonic plus noise model in concatinative speech synthesis,” IEEE Transactions on Speech and Audio Processing, vol. 9, no. 1, pp. 21-29, 2001.
Article Google Scholar
J. Jensen, R. Heusdens, and S. H. Jensen, “A perceptual subspace approach for modeling of speech and audio signals with damped sinusoids,” IEEE Transac-tions on Speech and Audio Processing, vol. 12, no. 2, pp. 121-132, 2004.
Article Google Scholar
M. G. Christensen, A. Jakobsson, S. V. Andersen, and S. H. Jensen, “Linear AM decomposition for sinusoidal audio coding,” in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 165-168, 2005.
Google Scholar
X. Serra and J. O. Smith, “Spectral modeling sythesis: A sound analy- sis/synthesis system based on a deterministic plus stochastic decomposition,” Computer Music Journal, vol. 14, no. 4, pp. 12-24, 1990.
Article Google Scholar
S. N. Levine, T. S. Verma, and J. O. Smith, “Multiresolution sinusoidal model-ing for wideband audio with modifications,” IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 1998.
Google Scholar
M. Goodwin, “Residual modeling in music analysis-synthesis,” in IEEE Interna- tional Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 1996.
Google Scholar
R. C. Hendriks, R. Heusdens, and J. Jensen, “Perceptual linear predictive noise modelling for sinusoid-plus-noise audio coding,” in IEEE International Confer-ence on Acoustics, Speech, and Signal Processing (ICASSP), May 2004.
Google Scholar
B. Edler, H. Purnhagen, and C. Ferekidis, “ASAC - analysis/synthesis audio codec for very low bit rates,” in Proceedings of 100th Convention of the Audio Engineering Society (AES), Preprint No. 4179, May 1996.
Google Scholar
H. Purnhagen and N. Meine, “HILN - the MPEG-4 parametric audio coding tools,” in IEEE International Symposium on Circuits and Systems (ISCAS), pp. 201-204, May 2000.
Google Scholar
R. Vafin and W. B. Kleijn, “On frequency quantization in sinusoidal audio coding,” IEEE Signal Processing Letters, vol. 12, no. 3, pp. 210-213, 2005.
Article Google Scholar
K. Karadimou, A. Mouchtaris, and P. Tsakalides, “Packet loss concealment for multichannel audio using the multiband source/filter model,” in Conference on Record of the Asilomar Conference Signals, Systems and Computers, pp. 1105-1109 (Pacific Grove, CA), November 2006.
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Computer Science (ICS), Foundation for Research and Technology – Hellas (FORTH), Crete, Greece
Athanasios Mouchtaris & Panagiotis Tsakalides
Department of Computer Science, University of Crete, Greece
Athanasios Mouchtaris & Panagiotis Tsakalides

Authors

Athanasios Mouchtaris
View author publications
You can also search for this author in PubMed Google Scholar
Panagiotis Tsakalides
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Informatics, University of Piraeus, Karaoli-Dimitriou Str. 80, 185 34, Piraeus, Greece
George A. Tsihrintzis
School of Electrical & Information Engineering, University of South Australia KES Centre, Mawson Lakes Campus, Adelaide, SA, 5095, Australia
Lakhmi C. Jain

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Mouchtaris, A., Tsakalides, P. (2008). Multichannel Audio Coding for Multimedia Services in Intelligent Environments. In: Tsihrintzis, G.A., Jain, L.C. (eds) Multimedia Services in Intelligent Environments. Studies in Computational Intelligence, vol 120. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78502-6_5

Download citation

DOI: https://doi.org/10.1007/978-3-540-78502-6_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78491-3
Online ISBN: 978-3-540-78502-6
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics