Skip to main content

Abstract

Speech and audio coding underlie many of the products and services that we have come to rely on and enjoy today. In this chapter, we discuss speech and audio coding, including a concise background summary, key coding methods, and the latest standards, with an eye toward current limitations and possible future research directions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. J.D. Gibson, Speech coding methods, standards, and applications. IEEE Circuits Syst. Magazine 5, 30–49 (2005)

    Article  Google Scholar 

  2. J.D. Gibson, T. Berger, T. Lookabaugh, D. Lindbergh, R.L. Baker, Digital Compression for Multimedia: Principles and Standards (Morgan-Kaufmann, San Francisco, 1998)

    Google Scholar 

  3. R. Cox, S.F. de Campos Neto, C. Lamblin, M.H. Sherif, ITU-T coders for wideband, superwideband, and fullband speech communication. IEEE Commun. Magazine 47, 106–109 (2009)

    Article  Google Scholar 

  4. ITU-T Recommendation P.862, Perceptual evaluation of speech quality (PESQ), an objective method for end-to-end speech quality assessment of narrowband telephone networks and speech codecs (2001)

    Google Scholar 

  5. ITU-T Recommendation P.862.2, Wideband extension to Recommendation P.862 for the assessment of wideband telephone networks and speech codecs (2007)

    Google Scholar 

  6. ITU-T Recommendation P.863, Perceptual objective listening quality assessment (2011)

    Google Scholar 

  7. W.-Y. Chan, T.H. Falk, Machine assessment of speech communication quality, in The Mobile Communications Handbook, ed. by J.D. Gibson, 3rd edn. (CRC Press, BocaRaton, FL, 2012). Chapter 30

    Google Scholar 

  8. Advanced audio distribution profile (A2DP) specification version 1.2, Bluetooth SIG, Audio video WG, http://www.bluetooth.org/. April 2007

  9. H.S. Malvar, Signal Processing with Lapped Transforms (Artech House, Norwood, 1992)

    MATH  Google Scholar 

  10. A.M. Kondoz, Digital Speech: Coding for Low Bit Rate Communication Systems (Wiley, West Sussex, 2004)

    Book  Google Scholar 

  11. J.H. Chen, A. Gersho, Adaptive postfiltering for quality enhancement of coded speech. IEEE Trans. Audio Process. 3, 59–70 (1995)

    Article  Google Scholar 

  12. S. Ragot et~al., ITU-T G.729.1: An 8-32 kbit/s scalable coder interoperable with G.729 for wideband telephony and Voice over IP, in Proceedings of ICASSP, Honolulu, April 2007

    Google Scholar 

  13. ITU-T Recommendation G.722.1, Coding at 24 and 32 kbit/s for hands-free operation in systems with low frame loss (1999)

    Google Scholar 

  14. ITU-T Recommendation G.722.2, Wideband coding of speech at around 16 kbit/s using Adaptive Multi-Rate Wideband (AMR-WB) (2002)

    Google Scholar 

  15. ITU-T Rec. G.718, Frame error robust narrowband and wideband embedded variable bit-rate coding of speech and audio from 8-32 kbit/s (2008)

    Google Scholar 

  16. ITU-T Rec. 719, Low-complexity, full-band audio coding for high-quality, conversational applications, June 2008

    Google Scholar 

  17. S. Karapetkov, G.719: the first ITU-T standard for full-band audio. Polycom white paper, April 2009

    Google Scholar 

  18. http://www.speex.org/

  19. S.V. Andersen, W.B. Kleijn, R. Hagen, J. Linden, M.N. Murthi, J. Skoglund, iLBC – a linear predictive coder with robustness to packet losses, in Proceedings of the IEEE Speech Coding Workshop, October 2002, pp 23–25

    Google Scholar 

  20. IETF Opus Interactive Audio Codec, http://opus-codec.org/ (2011)

  21. RFC6716, Definition of the Opus Audio Codec, September 2012

    Google Scholar 

  22. A. Ramo, Voice quality evaluation of various codecs, in ICASSP 2010, Dallas, 14–19 March 2010

    Google Scholar 

  23. A. Ramo, H. Toukomaa, Voice quality characterization of the IETF Opus Codec, in Proceedings of Interspeech 2011, Florence (2011)

    Google Scholar 

  24. A. Ramo, H. Toukomaa, On comparing speech quality of various narrow- and wideband speech codecs, in Proceeding of ISSPA, Sydney (2005)

    Google Scholar 

  25. M. Bosi, R.E. Goldberg, Introduction to Audio Coding and Standards (Kluwer, Boston, 2003)

    Book  Google Scholar 

  26. T. Painter, A. Spanias, Perceptual coding of digital audio. Proc. IEEE 88, 451–512 (2000)

    Article  Google Scholar 

  27. ITU-T Recommendation G.114, One-Way Transmission Time (2000)

    Google Scholar 

  28. ITU-T Rec. G.718 Amendment 2: New Annex B on superwideband scalable extension for ITU-T G.718 and corrections to main body fixed-point C-code and description text, March 2010

    Google Scholar 

  29. M. Neuendorf, P. Gournay, M. Multrus, J. Lecomte, B. Bessette, R. Geiger, S. Bayer, G. Fuchs, J. Hilpert, N. Rettelbach, F. Nagel, J. Robilliard, R. Salami, G. Schuller, R. Lefebvre, B. Grill, A novel scheme for low bitrate unified speech and audio coding-MPEG RM0, in Audio Engineering Society, Convention Paper 7713, May 2009

    Google Scholar 

  30. Y. Hiwasaki et~al., G.711.1: a wideband extension to ITU-T G.711. EUSIPCO 2008, Lausanne, 25–29 August 2008

    Google Scholar 

  31. M. Xie, D. Lindbergh, P. Chu, ITU-T G.722.1 Annex C: a new low-complexity 14 kHz audio coding standard, in Proceedings of ICASSP, Toulouse, May 2006

    Google Scholar 

  32. K. Jarvinen, I. Bouazizi, L. Laaksonen, P. Ojala, A. Ramo, Media coding for the next generation mobile system LTE. Comput. Commun. 33, 1916–1927 (2010)

    Article  Google Scholar 

  33. J. Rodman, The effect of bandwidth on speech intelligibility. Polycom white paper, September 2006

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jerry D. Gibson .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer Science+Business Media New York

About this chapter

Cite this chapter

Gibson, J.D. (2015). Challenges in Speech Coding Research. In: Ogunfunmi, T., Togneri, R., Narasimha, M. (eds) Speech and Audio Processing for Coding, Enhancement and Recognition. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-1456-2_2

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-1456-2_2

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4939-1455-5

  • Online ISBN: 978-1-4939-1456-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics