Abstract
Speech and audio coding underlie many of the products and services that we have come to rely on and enjoy today. In this chapter, we discuss speech and audio coding, including a concise background summary, key coding methods, and the latest standards, with an eye toward current limitations and possible future research directions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
J.D. Gibson, Speech coding methods, standards, and applications. IEEE Circuits Syst. Magazine 5, 30–49 (2005)
J.D. Gibson, T. Berger, T. Lookabaugh, D. Lindbergh, R.L. Baker, Digital Compression for Multimedia: Principles and Standards (Morgan-Kaufmann, San Francisco, 1998)
R. Cox, S.F. de Campos Neto, C. Lamblin, M.H. Sherif, ITU-T coders for wideband, superwideband, and fullband speech communication. IEEE Commun. Magazine 47, 106–109 (2009)
ITU-T Recommendation P.862, Perceptual evaluation of speech quality (PESQ), an objective method for end-to-end speech quality assessment of narrowband telephone networks and speech codecs (2001)
ITU-T Recommendation P.862.2, Wideband extension to Recommendation P.862 for the assessment of wideband telephone networks and speech codecs (2007)
ITU-T Recommendation P.863, Perceptual objective listening quality assessment (2011)
W.-Y. Chan, T.H. Falk, Machine assessment of speech communication quality, in The Mobile Communications Handbook, ed. by J.D. Gibson, 3rd edn. (CRC Press, BocaRaton, FL, 2012). Chapter 30
Advanced audio distribution profile (A2DP) specification version 1.2, Bluetooth SIG, Audio video WG, http://www.bluetooth.org/. April 2007
H.S. Malvar, Signal Processing with Lapped Transforms (Artech House, Norwood, 1992)
A.M. Kondoz, Digital Speech: Coding for Low Bit Rate Communication Systems (Wiley, West Sussex, 2004)
J.H. Chen, A. Gersho, Adaptive postfiltering for quality enhancement of coded speech. IEEE Trans. Audio Process. 3, 59–70 (1995)
S. Ragot et~al., ITU-T G.729.1: An 8-32 kbit/s scalable coder interoperable with G.729 for wideband telephony and Voice over IP, in Proceedings of ICASSP, Honolulu, April 2007
ITU-T Recommendation G.722.1, Coding at 24 and 32 kbit/s for hands-free operation in systems with low frame loss (1999)
ITU-T Recommendation G.722.2, Wideband coding of speech at around 16 kbit/s using Adaptive Multi-Rate Wideband (AMR-WB) (2002)
ITU-T Rec. G.718, Frame error robust narrowband and wideband embedded variable bit-rate coding of speech and audio from 8-32 kbit/s (2008)
ITU-T Rec. 719, Low-complexity, full-band audio coding for high-quality, conversational applications, June 2008
S. Karapetkov, G.719: the first ITU-T standard for full-band audio. Polycom white paper, April 2009
S.V. Andersen, W.B. Kleijn, R. Hagen, J. Linden, M.N. Murthi, J. Skoglund, iLBC – a linear predictive coder with robustness to packet losses, in Proceedings of the IEEE Speech Coding Workshop, October 2002, pp 23–25
IETF Opus Interactive Audio Codec, http://opus-codec.org/ (2011)
RFC6716, Definition of the Opus Audio Codec, September 2012
A. Ramo, Voice quality evaluation of various codecs, in ICASSP 2010, Dallas, 14–19 March 2010
A. Ramo, H. Toukomaa, Voice quality characterization of the IETF Opus Codec, in Proceedings of Interspeech 2011, Florence (2011)
A. Ramo, H. Toukomaa, On comparing speech quality of various narrow- and wideband speech codecs, in Proceeding of ISSPA, Sydney (2005)
M. Bosi, R.E. Goldberg, Introduction to Audio Coding and Standards (Kluwer, Boston, 2003)
T. Painter, A. Spanias, Perceptual coding of digital audio. Proc. IEEE 88, 451–512 (2000)
ITU-T Recommendation G.114, One-Way Transmission Time (2000)
ITU-T Rec. G.718 Amendment 2: New Annex B on superwideband scalable extension for ITU-T G.718 and corrections to main body fixed-point C-code and description text, March 2010
M. Neuendorf, P. Gournay, M. Multrus, J. Lecomte, B. Bessette, R. Geiger, S. Bayer, G. Fuchs, J. Hilpert, N. Rettelbach, F. Nagel, J. Robilliard, R. Salami, G. Schuller, R. Lefebvre, B. Grill, A novel scheme for low bitrate unified speech and audio coding-MPEG RM0, in Audio Engineering Society, Convention Paper 7713, May 2009
Y. Hiwasaki et~al., G.711.1: a wideband extension to ITU-T G.711. EUSIPCO 2008, Lausanne, 25–29 August 2008
M. Xie, D. Lindbergh, P. Chu, ITU-T G.722.1 Annex C: a new low-complexity 14 kHz audio coding standard, in Proceedings of ICASSP, Toulouse, May 2006
K. Jarvinen, I. Bouazizi, L. Laaksonen, P. Ojala, A. Ramo, Media coding for the next generation mobile system LTE. Comput. Commun. 33, 1916–1927 (2010)
J. Rodman, The effect of bandwidth on speech intelligibility. Polycom white paper, September 2006
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer Science+Business Media New York
About this chapter
Cite this chapter
Gibson, J.D. (2015). Challenges in Speech Coding Research. In: Ogunfunmi, T., Togneri, R., Narasimha, M. (eds) Speech and Audio Processing for Coding, Enhancement and Recognition. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-1456-2_2
Download citation
DOI: https://doi.org/10.1007/978-1-4939-1456-2_2
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4939-1455-5
Online ISBN: 978-1-4939-1456-2
eBook Packages: EngineeringEngineering (R0)