Scalable and Multi-Rate Speech Coding for Voice-over-Internet Protocol (VoIP) Networks

  • Tokunbo Ogunfunmi
  • Koji Seto


Communication by speech is still a very popular and effective means of transmitting information from one person to another. Speech signals form the basic method of human communication. The information communicated in this case is verbal or auditory information. The methods used for speech coding are very extensive and continuously evolving.

Speech Coding can be defined as the means by which the information-bearing speech signal is coded to remove redundancy thereby reducing transmission bandwidth requirements, improving storage efficiency, and making possible myriad other applications that rely on speech coding techniques.

The medium of speech transmission has also been changing over the years. Currently a large percentage of speech is communicated over channels using internet protocols. The voice-over-internet protocols (VoIP) channels present some challenges that have to be overcome in order to enable error-free, robust speech communication.

There are several advantages to use bit-streams that are multi-rate and scalable for time-varying VoIP channels. In this chapter, we present the methods for scalable, multi-rate speech coding for VoIP channels.


Packet Loss Frame Erasure Concealment Enhancement Layer Voice Over Internet Protocol Linear Predictive Code 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    J. Skoglund et~al., Voice over IP: speech transmission over packet networks, in Handbook of Speech Processing, ed. by J. Benesty, M.M. Sondhi, Y. Huang (Berlin, Springer, 2009). Chap. 15Google Scholar
  2. 2.
    A. Gersho, E. Paksoy, An overview of variable rate speech coding for cellular networks, in Proc. of the Int. Conf. On Selected Topics in Wireless Communications, Vancouver (1992)Google Scholar
  3. 3.
    A. Gersho, E. Paksoy, Variable rate speech coding for cellular networks, in Speech and Audio Coding for Wireless and Network Applications, ed. by B.S. Atal, V. Cuperman, A. Gersho (Kluwer Academic, Norwell, 1993), pp. 77–84CrossRefGoogle Scholar
  4. 4.
    V. Cuperman, P. Lupini, Variable rate speech coding, in Modern Methods of Speech Processing, ed. by R.P. Ramachandran, R.J. Mammone (Kluwer Academic, Norwell, 1995), pp. 101–120CrossRefGoogle Scholar
  5. 5.
    W. Gardner, P. Jacobs, C. Lee, QCELP: a variable rate speech coder for CDMA digital cellular, in Speech and Audio Coding for Wireless and Network Applications, ed. by B.S. Atal, V. Cuperman, A. Gersho (Kluwer Academic, Norwell, 1993), pp. 85–92CrossRefGoogle Scholar
  6. 6.
    TIA, Speech service option standard for wideband spread spectrum systems—TIA/EIA/IS-96 (1994)Google Scholar
  7. 7.
    TIA, Enhanced variable rate codec, speech service option 3 for wideband spread spectrum digital systems—TIA/EIA/IS-127 (1997)Google Scholar
  8. 8.
    K. Järvinen, Standardization of the adaptive multi-rate codec, in Proceedings of European Signal Processing Conference (EUSIPCO), Tampere (2000)Google Scholar
  9. 9.
    E. Ekudden, R. Hagen, I. Johansson, J. Svedberg, The AMR speech coder, in Proc. IEEE Workshop on speech coding, Porvoo (1999), pp. 117–119Google Scholar
  10. 10.
    ETSI, Digital cellular telecommunications system (Phase 2+); Adaptive multi-rate (AMR) speech transcoding, GSM 06.90, version 7.2.1, Release (1998)Google Scholar
  11. 11.
    ETSI, Universal mobile telecommunications system (UMTS); Mandatory speech codec speech processing functions AMR speech codec; Transcoding Functions, 3GPP TS 26.090 Version 3.1.0, Release (1999)Google Scholar
  12. 12.
    B. Bessette et~al., The adaptive multirate wideband speech codec (AMR-WB). IEEE Trans. Speech Audio Process. 10, 620–636 (2002)Google Scholar
  13. 13.
    ETSI, Adaptive multi-rate – wideband (AMR-WB) speech codec; Transcoding functions, 3GPP TS 26.190 (2001)Google Scholar
  14. 14.
    K. Järvinen et~al., Media coding for the next generation mobile system LTE. Elsevier Comput. Commun. 33(16), 1916–1927 (2010)Google Scholar
  15. 15.
    C. Laflamme, J-P. Adoul, R. Salami, S. Morisette, P. Mabilleau, 16 kbps wideband speech coding technique based on algebraic CELP, in Proceedings of IEEE International Conference on Acoustic, Speech and Signal Processing, Toronto (1991), pp. 13–16Google Scholar
  16. 16.
    K. Järvinen et~al., GSM enhanced full rate speech codec, in Proceedings of IEEE International Conference on Acoustic, Speech and Signal Processing, Munich (1997), pp. 771–774Google Scholar
  17. 17.
    T. Honkanen et~al., Enhanced full rate speech codec for IS-136 digital cellular system, in Proceedings of IEEE International Conference on Acoustic, Speech and Signal Processing, Munich (1997), pp. 731–734Google Scholar
  18. 18.
    S. Bruhn, P. Blöcher, K. Hellwig, J. Sjöberg, Concepts and solutions for link adaptation and inband signaling for the GSM AMR speech coding standard, in IEEE Vehicular Technology Conference (1999)Google Scholar
  19. 19.
    Y. Hiwasaki, T. Mori, H. Ohmuro, J. Ikedo, D. Tokumoto, A. Kataoka, Scalable speech coding technology for high-quality ubiquitous communications. NTT Tech. Rev. 2(3), 53–58 (2004)Google Scholar
  20. 20.
    B. Geiser et~al., Embedded speech coding: from G.711 to G.729.1, in Advances in Digital Speech Transmission, ed. by R. Martin, U. Heute, C. Antweiler (Wiley, Chichester, 2008), pp. 201–247. Chap. 8Google Scholar
  21. 21.
    ITU-T Rec. G.729.1, An 8–32 kbit/s Scalable Wideband Coder Bitstream Interoperable with G.729, International Telecommunication Union (ITU) (2006)Google Scholar
  22. 22.
    ITU-T Rec. G.726, Adaptive Differential Pulse Code Modulation (ADPCM) of Voice Frequencies, International Telecommunication Union (ITU) (1990)Google Scholar
  23. 23.
    ITU-T Rec. G.728, Coding of Speech at 16 kbit/s Using Low-Delay Code-Excited Linear Prediction (LD-CELP), International Telecommunication Union (ITU) (1992)Google Scholar
  24. 24.
    ITU-T Rec. G.729, Coding of Speech at 8 kbit/s Using Conjugate-Structure Algebraic-Code-Excited Linear Prediction (CS-ACELP), International Telecommunication Union (ITU) (1996)Google Scholar
  25. 25.
    S. Ragot, B. Kovesi, R. Trilling, D. Virette, N. Duc, D. Massaloux, S. Proust, B. Geiser, M. Gartner, S. Schandl, H. Taddei, Y. Gao, E. Shlomot, H. Ehara, K. Yoshida, T. Vaillancourt, R. Salami, M.S. Lee, D.Y. Kim. ITU-T G.729.1: an 8–32 kb/s scalable coder interoperable with G.729 for wideband telephony and voice over IP, in Proceedings of IEEE International Conference on Acoustic, Speech and Signal Processing (2007), pp. 529–532Google Scholar
  26. 26.
    TIA, Source-controlled variable-rate multimode wideband speech codec (VMR-WB)—3GPP2 C.S0052-0 (2004)Google Scholar
  27. 27.
    M. Jelínek, R. Salami, Wideband speech coding advances in VMR-WB standard. IEEE Trans. Audio Speech Lang. Process.15(4), 1167–1179 (2007)CrossRefGoogle Scholar
  28. 28.
    T. Vaillancourt et~al., ITU-T G.EV-VBR: a Robust 8–32 kb/s scalable coder for error prone telecommunications channels, in Proceedings of the Eusipco, Lausanne, Switzerland (2008)Google Scholar
  29. 29.
    V. Eksler, M. Jelínek, Transition coding for source controlled CELP codecs, in Proc. IEEE ICASSP, Las Vegas (2008), pp. 4001–4004Google Scholar
  30. 30.
    M. Oshikiri et~al., An 8–32 kb/s scalable wideband coder extended with MDCT-based bandwidth extension on top of a 6.8 kb/s narrowband CELP coder, in Proceedings of Interspeech, Antwerp (2007), pp.1701–1704Google Scholar
  31. 31.
    U. Mittal, J.P. Ashley, E. Cruz-Zeno. Low complexity factorial pulse coding of MDCT coefficients using approximation of combinatorial functions, in Proceedings of IEEE ICASSP, Honolulu, vol. 1 (2007), pp. 289–292Google Scholar
  32. 32.
    T. Vaillancourt et~al., Efficient frame erasure concealment in predictive speech codecs using glottal pulse resynchronisation, in Proceedings of IEEE ICASSP, Honolulu, vol. 4 (2007) pp. 1113–1116Google Scholar
  33. 33.
    T. Ogunfunmi, M.J. Narasimha, Speech over VoIP networks: advanced signal processing and system implementation. IEEE Circuits Syst. Magazine 12(2), 35–55 (2012)CrossRefGoogle Scholar
  34. 34.
    FCC,, Meeting presentation of the Technological Advisory Council (2011a)
  35. 35.
    FCC,, Meeting presentation of the Technological Advisory Council (2011b)
  36. 36.
    R. Lefebvre, P. Gournay, R. Salami, A study of design compromises for speech coders in packet networks, in Proceedings of IEEE International Conference on Acoustic, Speech and Signal Processing, vol. I (2004) pp. 265–268Google Scholar
  37. 37.
    V. Eksler, M. Jelinek, Glottal-shape codebook to improve robust-ness of CELP codecs. IEEE Trans. Audio Speech Lang. Process. 18(6), 1208–1217 (2010)CrossRefGoogle Scholar
  38. 38.
    J.-M. Valin, K. Vos, T. Terriberry, Internet Engineering Task Force RFC6716 (2012)Google Scholar
  39. 39.
    S.V. Andersen, W.B. Kleijn, R. Hagen, J. Linden, M.N. Murthi, J. Skoglund, iLBC-A linear predictive coder with robustness to packet losses, in IEEE Speech Coding Workshop Proceedings (2002), pp. 23–25Google Scholar
  40. 40.
    T. Ogunfunmi, M.J. Narasimha, Principles of Speech Coding (CRC, BocaRaton, 2010)CrossRefMATHGoogle Scholar
  41. 41.
    K. Seto, T. Ogunfunmi, Multi-rate iLBC using the DCT, in Proceedings of the IEEE Workshop on SiPS (2010), pp. 478–482Google Scholar
  42. 42.
    K. Seto, T. Ogunfunmi, Performance enhanced multi-rate iLBC, in Proceedings of the 45th Asilomar Conference (2011)Google Scholar
  43. 43.
    K. Seto, T. Ogunfunmi, Scalable multi-rate iLBC, in Proceedings of IEEE International Symposium on Circuits and Systems (2012)Google Scholar
  44. 44.
    K. Seto, T. Ogunfunmi, Scalable speech coding for IP networks: beyond iLBC. IEEE Trans. Audio Speech Lang. Process. 21(11), 2337–2345 (2013)CrossRefGoogle Scholar
  45. 45.
    K. Seto, T. Ogunfunmi, Scalable wideband speech coding for IP networks, in Proceedings of the 46th Annual Asilomar Conference on Signals, Systems, and Computers, Pacific Grove (2012)Google Scholar
  46. 46.
    K. Seto, T. Ogunfunmi, A scalable wideband speech codec based on the iLBC, submitted to IEEE Transactions on Audio, Speech, and Language Processing Google Scholar
  47. 47.
    S.V. Andersen et~al., Internet low bit-rate codec (iLBC) [Online]. RFC3951, IETF organization (2004),
  48. 48.
    C.M. Garrido, M.N. Murthi, S.V. Andersen, On variable rate frame independent predictive speech coding: re-engineering iLBC, in Proc. IEEE Int. Conf. Acoust. Speech Signal Process. 1, 717–720 (2006)Google Scholar
  49. 49.
    J. Princen, A. Bradley, Analysis/synthesis filter bank design based on time domain aliasing cancellation. IEEE Trans. Acoust. Speech Signal Process. 34(5), 1153–1161 (1986)CrossRefGoogle Scholar
  50. 50.
    ITU-T Rec. P.862, Perceptual Evaluation of Speech Quality (PESQ) (2001)Google Scholar
  51. 51.
    ITU-T Rec. P.501, Test signals for use in telephonometry (2012)Google Scholar
  52. 52.
    ITU-T Rec. G.191, Software tools for speech and audio coding standardization (2010)Google Scholar
  53. 53.
    E.N. Gilbert, Capacity of a burst-noise channel. Bell Syst. Tech. J. 39, 1253–1265 (1960)CrossRefGoogle Scholar
  54. 54.
    I. Daubechies, Orthonormal bases of compactly supported wavelets. Commun. Pure Appl. Math. 41, 909–996 (1988)MathSciNetCrossRefMATHGoogle Scholar
  55. 56.
    F. Chen, K. Kuo, Complexity scalability design in the internet low bit rate codec (iLBC) for speech coding. IEICE Trans. Inf. Syst. 93(5), 1238–1243 (2010)CrossRefGoogle Scholar
  56. 57.
    D. Collins, Carrier-Grade Voice-over-IP, 2nd edn. (McGraw-Hill, New York, 2002)Google Scholar
  57. 58.
    A. Das, E. Paksoy, A. Gersho, Multimode and variable-rate coding of speech, in Speech Coding and Synthesis, ed. by W.B. Kleijn, K.K. Paliwal (Elsevier, Amsterdam, 1995), pp. 257–288Google Scholar
  58. 59.
    J. Davidson, Voice-over-IP Fundamentals, 2nd edn. (Cisco, Indianapolis, 2006)Google Scholar
  59. 60.
    G.D. Forney, Coset codes. I. Introduction and geometrical classification. IEEE Trans. Inf. Theory 34(5), 1123–1151 (1988)MathSciNetCrossRefGoogle Scholar
  60. 62.
    A. Gersho, Advances in speech and audio compression. Proc. IEEE 82, 900–918 (1994)CrossRefGoogle Scholar
  61. 63.
    J. Gibson, Speech coding methods, standards and applications. IEEE Circuits Syst. Magazine 5(4), 30–40 (2005)CrossRefGoogle Scholar
  62. 64.
    J. Gibson, J. Hu, Rate distortion bounds for voice and video, Foundations and Trends in Communications and Information Theory 10(4), 379–514 (2013),, ISBN: 978-1-60198-778-5
  63. 65.
    L. Hanzo, F.C.A. Somerville, J.P. Woodard, Voice and Audio Compression for Wireless Communications, 2nd edn. (Wiley, Chichester, 2007)CrossRefGoogle Scholar
  64. 66.
    O. Hersent, IP Telephony: Deploying VoIP Protocols and IMS Infrastructure (Wiley, Chichester, 2010)Google Scholar
  65. 67.
    K. Homayounfar, Rate adaptive speech coding for universal multimedia access. IEEE Signal Process. Magazine 20(2), 30–39 (2003)Google Scholar
  66. 69.
    ITU-T Rec. G.718, Frame error robust narrow-band and wideband embedded variable bit-rate coding of speech and audio from 8–32 kbit/s, International Telecommunication Union (ITU) (2008)Google Scholar
  67. 71.
    M. Jelinek et~al., G.718: a new embedded speech and audio coding standard with high resilience to error-prone transmission channels. IEEE Commun. Magazine 46(10), 117–123 (2009)Google Scholar
  68. 73.
    W.B. Kleijn, Enhancement of coded speech by constrained optimization, in Proceedings of the IEEE Speech Coding Workshop (2002)Google Scholar
  69. 75.
    J. Makinen, B. Bessette, S. Bruhn, P. Ojala, R. Salami, A. Taleb, AMR-WB+: a new audio coding standard for 3rd generation mobile audio services, in Proc. IEEE Int. Conf. Acoust. Speech Signal Process. 2, 1109–1112 (2005)Google Scholar
  70. 76.
    S. Ragot, B. Bessette, R. Lefebvre, Low-complexity multi-rate lattice vector quantization with application to wideband speech coding at 32 kbit/s, in Proc. IEEE Int. Conf. Acoust. Speech Signal Process. 1, 501–504 (2004)Google Scholar
  71. 77.
    M.R. Schroeder, B.S. Atal, Code-excited linear prediction (CELP): High-quality speech at very low bit rates, in Proceedings of IEEE International Conference on Acoustic, Speech and Signal Processing (1984), pp. 937–940Google Scholar
  72. 78.
    D. Wright, Voice-over-Packet Networks (Wiley, Chichester, 2001)Google Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  1. 1.Department of Electrical EngineeringSanta Clara UniversitySanta ClaraUSA

Personalised recommendations