Advertisement

Audio Coding Standards, (Proprietary) Audio Compression Algorithms, and Broadcasting/Speech/Data Communication Codecs: Overview of Adopted Filter Banks

  • Vladimir Britanak
  • K. R. Rao
Chapter

Abstract

In general, audio coding or audio compression algorithms are used to obtain compact digital representation of high-quality audio signals for their efficient transmission and storage. The central objective in audio coding is to represent the signal with a minimum number of bits while achieving its transparent reproduction. Besides speech coding schemes based on linear prediction methods especially tailored for efficient speech compression, the developed perceptual transform-based audio coding schemes gained a greater attention, particularly for applications in consumer electronics. Typically, any transform-based audio coding scheme utilizes a near-perfect quadrature mirror filter (QMF) and/or perfect reconstruction cosine-modulated filter bank to obtain a block-wise representation of the audio signal in the frequency domain. Perceptual transform-based audio coding schemes developed up to now are briefly reviewed including the family of ISO/IEC MPEG audio coding standards, proprietary audio compression algorithms, broadcasting/speech/data communication codecs, as well as open-free, patent royalty-free audio/speech codecs. The discussion is concentrated especially on adopted near-perfect QMF and perfect reconstruction cosine-modulated filter banks, processing methods, and specified transform block sizes.

References

  1. 1.
    M. Bosi, R.E. Goldberg, Introduction to Digital Audio Coding and Standards, Part II: Audio Coding Standards (Springer Science+Business Media, New York, 2003), pp. 265–430Google Scholar
  2. 2.
    V.K. Madisetti (ed.), The Digital Signal Processing Handbook: Video, Speech, and Audio Signal Processing and Associated Standards, 2nd edn. (CRC, Boca Raton, FL, 2010)Google Scholar
  3. 3.
    H.S. Malvar, Extended lapped transforms: properties, applications, and fast algorithms. IEEE Trans. Signal Process. 40(11), 2703–2714 (1992)CrossRefGoogle Scholar
  4. 4.
    H.S. Malvar, Signal Processing with Lapped Transforms (Artech House, Norwood, MA, 1992)zbMATHGoogle Scholar
  5. 5.
    H. Malvar, A modulated complex lapped transform and its applications to audio processing, in Proceedings of the IEEE ICASSP’99, Phoenix, AR, May 1999, pp. 1421–1424Google Scholar
  6. 6.
    T. Painter, A. Spanias, Perceptual coding of digital audio. Proc. IEEE 88(4), 451–513 (2000)CrossRefGoogle Scholar
  7. 7.
    J.P. Princen, A.B. Bradley, Analysis/synthesis filter bank design based on time domain aliasing cancellation. IEEE Trans. Acoust. Speech Signal Process. ASSP-34(5), 1153–1161 (1986)CrossRefGoogle Scholar
  8. 8.
    J.P. Princen, A.W. Johnson, A.B. Bradley, Sub-band/transform coding using filter bank designs based on time domain aliasing cancellation, in Proceedings of IEEE ICASSP’87, Dallas, TX, April 1987, pp. 2161–2164Google Scholar
  9. 9.
    K.R. Rao, J.J. Hwang, MPEG-1 audiovisual coder for digital storage media (Chapter 10), in Techniques and Standards for Image, Video, Audio Coding (Prentice-Hall, Upper Saddle River, NJ, 1996), pp. 242–265Google Scholar
  10. 10.
    M. Schnell et al., Low delay filter banks for enhanced low delay audio coding, in Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY, October 2007, pp. 235–238Google Scholar
  11. 11.
    A. Spanias, T. Painter, V. Atti, Audio coding standards and algorithms (Chapter 10), in Audio Signal Processing and Coding (Wiley-Interscience, Hoboken, NJ, 2007), pp. 263–342Google Scholar

MPEG-1/2 Audio Coding Standards

  1. 12.
    Information Technology – Coding of Moving Pictures and Associated Audio for Digital Storage Media at up to about 1.5 Mbit/s. Part 3: Audio, ISO/IEC JTC1/SC29/WG11 MPEG, International Standard 11172-3 (MPEG-1) (1992)Google Scholar
  2. 13.
    Information Technology – Generic Coding of Moving Pictures and Associated Audio, Part 3: Audio, ISO/IEC JTC1/SC29/WG11 MPEG, International Standard 13818-3 (MPEG-2) (1994)Google Scholar

MPEG–2/4 AAC Audio Coding Standards

  1. 14.
    M. Bosi et al., ISO/IEC MPEG-2 advanced audio coding, in 101st AES Convention, Los Angeles, CA, November 1996. Preprint #4382. Also published in J. Audio Eng. Soc. 45(10), 789–813 (1997)Google Scholar
  2. 15.
    Information Technology – Generic Coding of Moving Pictures and Associated Audio Information, Subpart 7: Advanced Audio Coding (AAC), ISO/IEC JTC1/SC29/WG11 MPEG, International Standard 13818-7 (MPEG-2 AAC) (1997)Google Scholar
  3. 16.
    Information Technology – Coding of Audio-Visual Objects, Part 3: Audio, ISO/IEC JTC1/SC29/WG11 MPEG, International Standard 14496-3 (MPEG-4 Audio) (1999)Google Scholar

MPEG-4 AAC-LD Audio Coding Standard

  1. 17.
    E. Allamanche, R. Geiger, J. Herre, T. Sporer, MPEG-4 low delay audio coding based on the AAC codec, in 106th AES Convention, Munich, May 1999. Preprint #4929Google Scholar
  2. 18.
    M. Lutzky, G. Schuller, M. Gayer, U. Krämer, S. Wabnik, A guideline to codec delay, in 116th AES Convention, Berlin, May 2004. Preprint #6062Google Scholar
  3. 19.
    M. Lutzky, M. Schnell, M. Schmidt, R. Geiger, Structural analysis of low latency audio coding schemes, in 119th AES Convention, New York, NY, October 2005. Preprint #6601Google Scholar

MPEG-4 HE-AAC Audio Coding Standard

  1. 20.
    A.C. den Brinker et al., An overview of the coding standard MPEG-4 audio Amendments 1 and 2: HE-AAC, SSC and HE-AAC v2. EURASIP J. Audio Speech Music Process. Article ID 468971, 21 (2009)Google Scholar
  2. 21.
    J. Herre, M. Dietz, MPEG-4 High-Efficiency AAC coding. IEEE Signal Process. Mag. 25(3), 137–142 (2008)CrossRefGoogle Scholar
  3. 22.
    Information Technology – Coding of Audio-Visual Objects – Part 3: Audio, Subpart 4: General Audio Coding (GA)-AAC, TwinVQ, BSAC. ISO/IEC 14496–3:2005(E) (2005)Google Scholar
  4. 23.
    M. Wolters, K. Kjörling, D. Homm, H. Purnhagen, A closer look into MPEG-4 High Efficiency AAC, in 115th AES Convention, New York, NY, October 2003. Preprint #5871Google Scholar

MPEG-4 AAC-ELD Audio Coding Standard

  1. 24.
    Information Technology – Coding of Audio-Visual Objects – Part 3: Audio, Amendment 9: Enhanced Low Delay AAC. ISO/IEC 14496–3:2005/FDAM 9:2007(E), N9499, Shenzhen, October 2007Google Scholar
  2. 25.
    M. Lutzky, M.L. Valero, M. Schnell, J. Hilpert, AAC-ELD v2 – The new state of the art in high quality communication audio coding, in 131st AES Convention, New York, NY, October 2011. Preprint #8516Google Scholar
  3. 26.
    M. Schnell et al., Enhanced MPEG-4 low delay AAC – Low bitrate high quality communication, in 122nd AES Convention, Vienna, May 2007. Preprint #6998Google Scholar
  4. 27.
    M. Schnell et al., MPEG-4 enhanced low delay AAC – A new standard for high quality communication, in 125th AES Convention, San Francisco, CA, October 2008. Preprint #7503Google Scholar

MPEG-4 SLS and HD-AAC/SLS Scalable Lossless Audio Coding Standards

  1. 28.
    R. Geiger, G. Schuller, J. Herre, R. Sperschneider, T. Sporer, Scalable perceptual and lossless audio coding based on MPEG-4 AAC, in 115th AES Convention, New York, NY, October 2003. Preprint #5868Google Scholar
  2. 29.
    R. Geiger, R. Yu, J. Herre, S. Rahardja, S.-W. Kim, X. Lin, M. Schmidt, ISO/IEC MPEG-4 high-definition scalable advanced audio coding. J. Audio Eng. Soc. 55(1)/2, 27–43 (2007)Google Scholar
  3. 30.
    ISO/IEC 14496-3:2005/Amd.3:2006, Coding of Audio-Visual Objects – Part 3: Audio, Amendment 3: Scalable Lossless Coding (SLS). International Standards Organization, Geneva (2006)Google Scholar
  4. 31.
    R. Yu, R. Geiger, S. Rahardja, J. Herre, X. Lin, H. Huang, MPEG-4 scalable to lossless audio coding, in 117th AES Convention, San Francisco, CA, October 2004. Preprint #6183Google Scholar
  5. 32.
    R. Yu, S. Rahardja, X. Lin, C.C. Ko, A fine granular scalable to lossless audio coding. IEEE Trans. Audio Speech Lang. Process. 14(4), 1352–1363 (2006)CrossRefGoogle Scholar

MPEG-D USAC: Unified Speech and Audio Coding

  1. 33.
    B. Edler, S. Disch, S. Bayer, G. Guillaume, R. Geiger, A time-warped MDCT approach to speech transform coding, in 126th AES Convention, Munich, May 2009. Preprint #7710Google Scholar
  2. 34.
    C.R. Helmrich et al., Efficient transform coding of two-channel audio signals by means of complex-valued stereo prediction, in Proceedings of the IEEE ICASSP’2011, Prague, May 2011, pp. 497–500Google Scholar
  3. 35.
    A. Heuerberger, G. Elst, R. Hanke (eds.), MPEG unified speech and audio coding – Bridging the gap, in Microelectronic Systems: Circuits, Systems and Applications (Springer, Berlin, 2011), pp. 343–353Google Scholar
  4. 36.
    ISO/IEC 23003—3:2012, MPEG audio technologies, Part 3: Unified Speech and Audio Coding, Geneva, January 2012Google Scholar
  5. 37.
    K. Kikuri, N. Naka, MPEG Unified speech and audio coding enabling efficient coding of both speech and music. NTT DOCOMO Tech. J. 13(3), 17–22 (2011)Google Scholar
  6. 38.
    M. Neuendorf et al., A novel scheme for low bit rate Unified Speech and Audio Coding – MPEG RM0, in 126th AES Convention, Munich, May 2009. Preprint #7713Google Scholar
  7. 39.
    M. Neuendorf et al., Unified speech and audio coding scheme for high quality at low bitrates, in Proceedings of the IEEE ICASSP’2009, Taipei, April 2009, pp. 1–4Google Scholar
  8. 40.
    M. Neuendorf et al., The ISO/MPEG Unified Speech and Audio Coding standard – Consistent high quality for all content types and at all bit rates, in 132nd AES Convention, Budapest, April 2012. Preprint #8654. Also published in J. Audio Eng. Soc. 61(12), 956–977 (2013)Google Scholar
  9. 41.
    S. Quackenbush, MPEG unified speech and audio coding. IEEE MultiMedia 20(2), 72–78 (2013)CrossRefGoogle Scholar

Proprietary Audio Compression Algorithms

  1. 42.
    M. Bosi, G.A. Davidson, High-quality, low-rate audio transform coding for transmission and multimedia applications, in 93rd AES Convention, San Francisco, CA, December 1992. Preprint# 3365Google Scholar
  2. 43.
    G.A. Davidson, L.D. Fielder, M. Antill, Low-complexity transform coder for satellite link applications, in 89th AES Convention, New York, NY, September 1990. Preprint# 2966Google Scholar
  3. 44.
    G.A. Davidson, M.A. Isnardi, L.D. Fielder, M.S. Goldman, C.C. Todd, ATSC video and audio coding. Proc. IEEE 94(1), 60–76 (2006)CrossRefGoogle Scholar
  4. 45.
    Digital Audio Compression (AC-3) ATSC Standard, Document A/52/10 of Advanced Television Systems Committee (ATSC), Audio Specialist Group T3/S7, Washington, DC, December 1995Google Scholar
  5. 46.
    Digital Audio Compression Standard (AC-3, E-AC-3), Revision B, Document A/52B of Advanced Television Systems Committee (ATSC), Washington DC, December 2012Google Scholar
  6. 47.
    L.D. Fielder, G.A. Davidson, AC-2: a family of low complexity transform-based music coders, in Proceedings of the 10th International AES Conference: Images of Audio, London, September 1991, pp. 55–70Google Scholar
  7. 48.
    L.D. Fielder, D.P. Robinson, AC-2 and AC-3: the technology and its applications, in 5th Australian Regional Convention, Sydney, April 1995. Preprint #4022Google Scholar
  8. 49.
    L.D. Fielder et al., Introduction to Dolby Digital Plus, an enhancement to the Dolby digital coding system, in 117th AES Convention, San Francisco, CA, October 2004. Preprint #6196Google Scholar
  9. 50.
    J.D. Johnson, A.J. Ferreira, Sum-difference stereo transform coding, in Proceedings of the IEEE ICASSP’92, vol. II, San Francisco, CA, March 1992, pp. 569–572Google Scholar
  10. 51.
    J. Johnson et al., AT&T perceptual audio coder (PAC), in Collected Papers on Digital Audio Bit-Rate Reduction, ed. by N. Gilchrist, C. Grewin (Audio Engineering Society, New York, 1996), pp. 73–81Google Scholar
  11. 52.
    D. Sinha, J.D. Johnson, Audio compression at low bit rates using a signal adaptive switched filterbank, in Proceedings of the IEEE ICASSP’96, Atlanta, GA, May 1996, pp. 1053–1056Google Scholar
  12. 53.
    K. Tsustsui at al., ATRAC: adaptive transform acoustics coding for MiniDisc, in 93rd AES Convention, San Francisco, CA, October 1992. Preprint #3456Google Scholar
  13. 54.
    T. Yoshida, The rewritable MiniDisc system. Proc. IEEE 82(10), 1492–1500 (1994)CrossRefGoogle Scholar

Broadcasting/Speech/Data Communication Codecs

  1. 55.
    3GGP2 C.S0014–C v1.0, Enhanced variable rate codec, speech service Option 3, 68 and 70 for wide-band spread spectrum digital systems (2007)Google Scholar
  2. 56.
    M. Bellanger, D. Matera, M. Tanda, A filter bank multicarrier scheme running at symbol rate for future wireless systems, in Proceedings of the IEEE Wireless Telecommunications Symposium (WTS’2015), New York, NY, April 2015, pp. 1–5Google Scholar
  3. 57.
    M. Bellanger, D. Matera, M. Tanda, Lapped-OFDM as an alternative to CP-OFDM for 5G asynchronous access and cognitive radio, in Proceedings of the IEEE 81st Vehicular Technology Conference (VTC Spring), Glasgow, May 2015, pp. 1–5Google Scholar
  4. 58.
    Digital Radio Mondiale (DRM): System Specification, ETSI ES 201 980 v3.1.1 (2009–08), ETSI Standard, August 2009 (available on web site http://www.drm.org)
  5. 59.
    W. Hoeg, T. Lauterbach (eds.), Audio services and applications (Chapter 3), in Digital Audio Broadcasting: Principles and Applications of DAB, DAB+ and DMB, 3rd edn. (Wiley, Chichester, 2009), pp. 93–165Google Scholar
  6. 60.
    ITU-T Recommendation G.722.1 Annex C, Low-complexity coding at 24 and 32 kbit/s for hands-free operation in systems with low frame loss. Annex C: 14 kHz Mode at 24, 32, and 48 kbit/s, May 2005Google Scholar
  7. 61.
    ITU-T SG16 Q9 – Contribution 199: extended high-level description of the Q9 EV-VBR baseline codec (2007)Google Scholar
  8. 62.
    L. Laaksonen et al., Super wide-band extension of G.718 and G.729.1 speech codec, in Proceedings of 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010, Makuhari, September 2010Google Scholar
  9. 63.
    J. Mäkinen et al., AMR-WB+: a new audio coding standard for 3rd generation mobile audio services, in Proceedings of the IEEE ICASSP’2005, vol. II, Philadelphia, PA, March 2005, pp. 1109–1112Google Scholar
  10. 64.
    S. Ragot et al., ITU-T G.729.1: an 8–32 kbit/s scalable coder interoperable with G.729 for wideband telephony and voice IP, in Proceedings of the IEEE ICASSP’2007, Honolulu, HI, April 2007, pp. 529–532Google Scholar
  11. 65.
    R. Salami et al., Extended AMR-WB for high-quality audio on mobile devices. IEEE Commun. Mag. 44(5), 90–97 (2006)CrossRefGoogle Scholar
  12. 66.
    Sirius Satellite Radio, Available on web site: http://www.siriusradio.com
  13. 67.
    T. Vaillancourt et al., ITU-T EV-VBR: a robust 8–32kbit/s scalable coder for error prone telecommunication channels, in Proceedings of the 16th European Signal Processing Conference, Lausanne, August 2008Google Scholar
  14. 68.
    M. Xie, D. Lindbergh, P. Chu, From ITU-T G.722.1 to ITU-T G.722.1 Annex C: a new low-complexity 14kHz bandwidth audio coding standard, in Proceedings of the IEEE ICASSP’2006, vol. 5, Toulouse, May 2006, pp. 173–176. Also published in J. Multimedia 2(2), 65–76 (2007)Google Scholar
  15. 69.
    M. Xie, P. Chu, A. Taleb, M. Briand, ITU-T G.719: a new low-complexity full-band (20kHz) audio coding standard for high quality conversational applications, in Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA’2009), New Paltz, NY, October 2009, pp. 265–268Google Scholar
  16. 70.
    XM Satellite Radio, Available on web site: http://www.xmradio.com

Open-Source and royalty-Free Audio/Speech Codecs

  1. 71.
    OPUS interactive audio/speech codec, 2016. Available on web sites: www.vorbis.com or www.opus-codec.org
  2. 72.
    The CELT ultra-low delay audio codec, February 2011. Available on web sites: www.vorbis.com or www.celt-codec.org
  3. 73.
    J.-M. Valin, T.B. Terriberry, G. Maxwell, A full-bandwidth audio codec with low complexity and very low delay, in Proceedings of the 17th European Signal Processing Conference (EUSIPCO’2009), Glasgow, August 2009, pp. 1254–1258Google Scholar
  4. 74.
    J.M. Valin, K. Vos, T.B. Terriberry, Definition of the OPUS audio codec, Internet Engineering Task Force (IETF). RFC 6716 Standard Specification, September 2012. Available on web site: www.vorbis.com
  5. 75.
    J.-M. Valin, T.B. Terriberry, C. Montgomery, G. Maxwell, A high-quality speech and audio codec with less than 10 ms delay. IEEE Trans. Audio Speech Lang. Process. 18(1), 58–67 (2010)CrossRefGoogle Scholar
  6. 76.
    J.-M. Valin, G. Maxwell, T.B. Terriberry, C. Montgomery, K. Vos, High-quality, low-delay music coding in the Opus codec, in 135th AES Convention, New York, NY, October 2013. Preprint #8942Google Scholar
  7. 77.
    Vorbis I specification, Xiph.Org Foundation (2015). Available on web site: www.vorbis.com
  8. 78.
    K. Wright, Notes on Ogg Vorbis and the MDCT, Draft document available on web site: www.free-comp-shop.com/vorbis.html (2003), 7 pp.

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  • Vladimir Britanak
    • 1
  • K. R. Rao
    • 2
  1. 1.Institute of InformaticsSlovak Academy of SciencesBratislavaSlovakia
  2. 2.The University of Texas at ArlingtonArlingtonUSA

Personalised recommendations