Skip to main content
Log in

Abstract

While previous MPEG Audio standards mainly were focused on the representation of audio signals close to or equal to CD quality, the new MPEG-4 Audio standard extends the range of applicability towards significantly lower bit rates. Furthermore it offers extended functionalities for the representation of natural and even synthetic audio signals in an object oriented fashion. This paper gives a brief overview on the complete audio part of the MPEG-4 standard and more detailed information on its parts related to speech coding.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Atal, B.S. and Remde, J. (1981). A new model of LPC excitation for producing natural-sounding speech at low bit rates.Proc. ICASSP 81.

  • Atal, B.S. and Schroeder, M.R. (1984). Stochastic coding of speech signals at very low bit rates.Proc. IEEE Int. Conf. on Communications. Amsterdam, The Netherlands, p. 48.1.

  • Bosi, M. et. al. (1997). ISO/IEC MPEG-2 advanced audio coding.Journal Audio Eng. Soc., 45(10):789–814.

    Google Scholar 

  • Edler, B., Herre, J., and Brandenburg, K. (1997). MPEG-4 audio core experiment test methodology. ISO/JTC1 SC29 WG11, N1748.

  • ETSI (1996). GSM 06.60 enhanced full rate (EFR) speech transcoding. European Telecommunications Standards Institute.

  • Grill, B. (1997). A bit rate scalable perceptual coder for MPEG-4 audio. 103rd AES Convention.

  • Herre, J. and Schultz, D. (1998). Extending the MPEG-4 AAC codec by perceptual noise substitution. 104th AES Convention.

  • ISO/JTC1 (1993). International standard IS 11172-3: Codingof moving pictures and associated audio for digital storage media at up to about 1.5 Mbit/s, part 3: Audio. ISO/JTC1 SC29 WG11.

  • ISO/JTC1 (1995). International standard IS 13818-3: Generic coding of moving pictures and associated audio information, part 3: Audio. ISO/JTC1 SC29 WG11.

  • ISO/JTC1 (1997). International standard IS 13818-7: Generic coding of moving pictures and associated audio information, part 7: Advanced audio coding. ISO/JTC1 SC29 WG11.

  • ISO/JTC1 (1998a). Final draft international standard FDIS 14496-3: Coding of audiovisual objects, part 3: Audio. ISO/JTC1 SC29 WGI1.

  • ISO/JTC1 (1998b). Final draft international standard FDIS 14496-2: Coding of audiovisual objects, part 2: Visual. ISO/JTC1 SC29 WG11.

  • ISO/JTC1 (1998c). Final draft international standard FDIS 14496-1: Coding of audiovisual objects, part 1: Systems. ISO/JTC1 SC29 WG11.

  • ISO/JTC1 (I998d). Final draft international standard FDIS 14496-6: Coding of audiovisual objects, part 6: DMIF. ISO/JTC1 SC29 WG11.

  • Itakura, F. (1975). Line spectral representation of linear predictive coefficients of speech signals.Journal Acoust. Soc. Am, 57:S35.

    Google Scholar 

  • ITU-T (1996a). Recommendation P.800—methods for subjective determination of transmission quality. International Telecommunication Union.

  • ITU-T (1996b). Recommendation G.723.1—dual rate speech coder for multimedia communications transmitting at 5.3 and 6.3 kbit/s. International Telecommunication Union.

  • ITU-T (1996c). Recommendation G.729—coding of speech using conjugate structure algebraic-code-excited linear prediction (CS-ACELP). International Telecommunication Union.

  • ITU-T (1998). Recommendation G.722—7 kHz audio coding within 64 kbit/s. International Telecommunication Union.

  • Iwakami, N., Moriya, T., and Miki, S. (1995). High-quality audio coding at less than 64 kbit/s using TWIN VQ.Proc. ICASSP 95, pp. 937–940.

  • Kroon, P., Deprettere, E.F., and Sluyter, R.J. (1986). Regular-pulse excitation: A novel approach to effective and efficient multipulse coding of speech.IEEE Trans. ASSP, ASSP-34(5):1054–1063.

    Google Scholar 

  • NCS/GSA (1991). Analog to digital conversion of radio voice by 4,800 b/s code excited linear prediction. National Communications System/General Services Administration FED-STD 1016.

  • Nishiguchi, M., Iijima, K., and Matsumoto, J. (1997). Harmonic vector excitation coding of speech at 2.0 kbps.Proc. IEEE Workshop on Speech Coding for Telecommunications, pp. 39–40.

  • Nishiguchi, M., Kim, S.-W., and Ojala, P. (1998). MPEG-4 audio verification tests specifications—speech part. ISO/JTC1 SC29 WG11, N2277.

  • Nomura, T. et al. (1996a). Technical description of the 6 kbps compression algorithm for MPEG-4/Audio submission from NEC. ISO/JTC1 SC29 WG11, M0694.

  • Nomura T. et al. (1996b). Proposal of compression algorithm with rate control for MPEG-4/Audio core experiments. ISO/JTC1 SC29 WG11, M1509.

  • Nomura, T. et al. (1997a). A bitrate scalable tool for the narrow band CELP coder of the MPEG-4/Audio VM. ISO/JTC1 SC29 WG11, M2083.

  • Nomura, T. et al. (1997b). An extension of the narrow-band CELP VM coder to a bandwidth scaleable CELP coder. ISO/JTC1 SC29 WG11, M2486.

  • Ojala, P. et al. (1998). Report on the MPEG-4 speech codec verification tests. ISO/JTC1 SC29 WG11, N2424.

  • Scheirer, E. (1998). The MPEG-4 structured audio standard.Proc. ICASSP 98, pp. 3801–3804.

    Google Scholar 

  • Wuppermann, F. and de Bont, F. (1996). Detailed technical description for an MPEG-4 audio codec with a bit rate of 16 kbit/s at a reference sampling frequency of 16 kHz. ISO/JTC1 SC29 WG11, M0696.

  • Yin, L., Suonio, M., and VÄÄnÄnen, M. (1997). Proposal for a core experiment on forward prediction in MPEG-4. ISO/JTC1 SC29 WG11, M2032.

Download references

Author information

Authors and Affiliations

Authors

Additional information

This paper was written while the author was research visitor at Lucent Technologies, Bell Laboratories, Murray Hill, NJ, USA.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Edler, B. Speech coding in MPEG-4. Int J Speech Technol 2, 289–303 (1999). https://doi.org/10.1007/BF02108645

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02108645

Keywords

Navigation