Skip to main content

Voice over IP: Speech Transmission over Packet Networks

  • Chapter

Part of the book series: Springer Handbooks ((SHB))

Abstract

The emergence of packet networks for both data and voice traffic has introduced new challenges for speech transmission designs that differ significantly from those encountered and handled in traditional circuit-switched telephone networks, such as the public switched telephone network (PSTN). In this chapter, we present the many aspects that affect speech quality in a voice over IP (VoIP) conversation. We also present design techniques for coding systems that aim to overcome the deficiencies of the packet channel. By properly utilizing speech codecs tailored for packet networks, VoIP can in fact produce a quality higher than that possible with PSTN.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   579.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   729.00
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Abbreviations

ADPCM:

adaptive differential pulse code modulation

AEC:

acoustic echo cancelation

AGC:

automatic gain control

ARQ:

automatic repeat request

CELP:

code-excited linear prediction

CNG:

comfort noise generation

DPCM:

differential PCM

FEC:

frame erasure concealment

FIFO:

first-in first-out

GSM:

Groupe Spéciale Mobile

IETF:

Internet Engineering Task Force

IP:

internet protocol

ITU:

International Telecommunication Union

LAN:

local-area network

LPC:

linear prediction coefficients

MDC:

multiple description coding

MOS:

mean opinion score

OLA:

overlap-and-add

OSI:

open systems interconnection reference

PCM:

pulse-code modulation

PDA:

pitch determination algorithms

PESQ:

perceptual evaluation of speech quality

PLC:

packet loss concealment

PSTN:

public switched telephone network

QoS:

quality-of-service

RS:

Reed-Solomon

RSVP:

resource reservation protocol

RTP:

real-time transport protocol

SOLA:

synchronized overlap add

TCP:

transmission control protocol

UDP:

user datagram protocol

VAD:

voice activity detector

VoIP:

voice over IP

WLAN:

wireless LAN

WSOLA:

waveform similarity OLA

WiFi:

wireless fidelity

XOR:

exclusive-or

iLBC:

internet low-bit-rate codec

References

  1. L.R. Rabiner, R.W. Schafer: Digital Processing of Speech Signals (Prentice Hall, Englewood Cliffs 1978)

    Google Scholar 

  2. W. Stallings: High-Speed Networks: TCP/IP and ATM Design Principles (Prentice Hall, Englewood Cliffs 1998)

    Google Scholar 

  3. Information Sciences Institute: Transmission control protocol, IETF RFC793 (1981)

    Google Scholar 

  4. J. Postel: User datagram protocol, IETF RFC768 (1980)

    Google Scholar 

  5. H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson: RTP a transport protocol for real-time applications, IETF RFC3550 (2003)

    Google Scholar 

  6. ITU-T: G.131: Talker echo and its control (2003)

    Google Scholar 

  7. ITU-T: G.114: One-way transmission time (2003)

    Google Scholar 

  8. C.G. Davis: An experimental pulse code modulation system for short haul trunks, Bell Syst. Tech. J. 41, 25-97 (1962)

    Article  Google Scholar 

  9. IEEE: 802.11: Part 11: Wireless LAN medium access control (MAC) and physical layer (PHY) specifications (2003)

    Google Scholar 

  10. IEEE: 802.15.1: Part 15.1: Wireless medium access control (MAC) and physical layer (PHY) specifications for wireless personal area networks (WPANs) (2005)

    Google Scholar 

  11. E. Dimitriou, P. Sörqvist: Internet telephony over WLANs, 2003 USTAs Telecom Eng. Conf. Supercomm (2003)

    Google Scholar 

  12. ITU-T: G.711: Pulse code modulation (PCM) of voice frequencies (1988)

    Google Scholar 

  13. IEEE: 802.1D Media access control (MAC) bridges (2004)

    Google Scholar 

  14. D. Grossman: New terminology and clarifications for diffserv, IETF RFC3260 (2002)

    Google Scholar 

  15. R. Braden, L. Zhang, S. Berson, S. Herzog, S. Jamin: Resource ReSerVation Protocol (RSVP) - Version 1 Functional specification, IETF RFC2205 (1997)

    Google Scholar 

  16. C. Aurrecoechea, A.T. Campbell, L. Hauw: A survey of QoS architectures, Multimedia Syst. 6(3), 138-151 (1998)

    Article  Google Scholar 

  17. IEEE: 802.11e: Medium Access Control (MAC) Quality of Service (QoS) Enhancements (2005)

    Google Scholar 

  18. E. Hänsler, G. Schmidt: Acoustic Echo and Noise Control - A Practical Approach (Wiley, New York 2004)

    Book  Google Scholar 

  19. ITU-T: G.729: Coding of speech at 8 kbit/s using conjugate-structure algebraic-code-excited linear prediction (CS-ACELP) (1996)

    Google Scholar 

  20. S. Andersen, A. Duric, H. Astrom, R. Hagen, W.B. Kleijn, J. Linden: Internet Low Bit Rate Codec (iLBC), IETF RFC3951 (2004)

    Google Scholar 

  21. Ajay Bakre: www.globalipsound.com/datasheets/isac.pdf (2006)

    Google Scholar 

  22. S.B. Moon, J.F. Kurose, D.F. Towsley: Packet audio playout delay adjustment: Performance bounds and algorithms, Multimedia Syst. 6(1), 17-28 (1998)

    Article  Google Scholar 

  23. Ajay Bakre: www.globalipsound.com/datasheets/neteq.pdf (2006)

    Google Scholar 

  24. Y. Liang, N. Farber, B. Girod: Adaptive playout scheduling and loss concealment for voice communication over IP networks, IEEE Trans. Multimedia 5(4), 257-259 (2003)

    Article  Google Scholar 

  25. F. Liu, J. Kim, C.-C.J. Kuo: Adaptive delay concealment for internet voice applications with packet-based time-scale modification, Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (2001)

    Google Scholar 

  26. ITU-T: P.800: Methods for subjective determination of transmission quality (1996)

    Google Scholar 

  27. S. Pennock: Accuracy of the perceptual evaluation of speech quality (PESQ) algorithm, Proc. Measurement of Speech and Audio Quality in Networks (2002)

    Google Scholar 

  28. M. Varela, I. Marsh, B. Grönvall: A systematic study of PESQs behavior (from a networking perspective), Proc. Measurement of Speech and Audio Quality in Networks (2006)

    Google Scholar 

  29. ITU-T: P.862: Perceptual evaluation of speech quality (PESQ), an objective method for end-to-end speech quality assessment of narrowband telephone networks and speech codecs (2001)

    Google Scholar 

  30. ITU-T: P.862.1: Mapping function for transforming P.862 raw result scores to MOS-LQO (2003)

    Google Scholar 

  31. C. Perkins, O. Hodson, V. Hardman: A survey of packet loss recovery techniques for streaming audio, IEEE Network 12, 40-48 (1998)

    Article  Google Scholar 

  32. J. Rosenberg, H. Schulzrinne: An RTP payload format for generic forward error correction, IETF RFC2733 (1999)

    Google Scholar 

  33. J. Lacan, V. Roca, J. Peltotalo, S. Peltotalo: Reed-Solomon forward error correction (FEC), IETF (2007), work in progress

    Google Scholar 

  34. J. Rosenberg, L. Qiu, H. Schulzrinne: Integrating packet FEC into adaptive voice playout buffer algorithms on the internet, Proc. Conf. Comp. Comm. (IEEE INFOCOM 2000) (2000) pp. 1705-1714

    Google Scholar 

  35. W. Jiang, H. Schulzrinne: Comparison and optimization of packet loss repair methods on VoIP perceived quality under bursty loss, Proc. Int. Workshop on Network and Operating System Support for Digital Audio and Video (2002)

    Google Scholar 

  36. E. Martinian, C.-E.W. Sundberg: Burst erasure correction codes with low decoding delay, IEEE Trans. Inform. Theory 50(10), 2494-2502 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  37. C. Perkins, I. Kouvelas, O. Hodson, V. Hardman, M. Handley, J. Bolot, A. Vega-Garcia, S. Fosse-Parisis: RTP payload format for redundant audio data, IETF RFC2198 (1997)

    Google Scholar 

  38. J.-C. Bolot, S. Fosse-Parisis, D. Towsley: Adaptive FEC-based error control for internet telephony, Proc. Conf. Comp. Comm. (IEEE INFOCOMM ʼ99) (IEEE, New York 1999) p. 1453-1460

    Google Scholar 

  39. T.M. Cover, J.A. Thomas: Elements of Information Theory (Wiley, New York 1991)

    Book  MATH  Google Scholar 

  40. A.A.E. Gamal, T.M. Cover: Achievable rates for multiple descriptions, IEEE Trans. Inform. Theory IT-28(1), 851-857 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  41. L. Ozarow: On a source coding prolem with two channels and three receivers, Bell Syst. Tech. J. 59, 1909-1921 (1980)

    Article  MathSciNet  MATH  Google Scholar 

  42. V.A. Vaishampayan, J. Batllo: Asymptotic analysis of multiple description quantizers, IEEE Trans. Inform. Theory 44(1), 278-284 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  43. N.S. Jayant, S.W. Christensen: Effects of packet losses in waveform coded speech and improvements due to an odd-even sample-interpolation procedure, IEEE Trans. Commun. COM-29(2), 101-109 (1981)

    Article  Google Scholar 

  44. N.S. Jayant: Subsampling of a DPCM speech channel to provide two self-contained half-rate channels, Bell Syst. Tech. J. 60(4), 501-509 (1981)

    Article  Google Scholar 

  45. A. Ingle, V.A. Vaishampayan: DPCM system design for diversity systems with applications to packetized speech, IEEE Trans. Speech Audio Process. 3(1), 48-58 (1995)

    Article  Google Scholar 

  46. A.O.W. Jiang: Multiple description speech coding for robust communication over lossy packet networks, IEEE Int. Conf. Multimedia and Expo (2000) pp. 444-447

    Google Scholar 

  47. V.K. Goyal: Multiple description coding: Compression meets the network, IEEE Signal Process. Mag. 18, 74-93 (2001)

    Article  Google Scholar 

  48. A.D. Wyner: Recent results in the Shannon theory, IEEE Trans. Inform. Theory 20(1), 2-10 (1974)

    Article  MathSciNet  MATH  Google Scholar 

  49. A.D. Wyner, J. Ziv: The rate-distortion function for source coding with side information at the decoder, IEEE Trans. Inform. Theory 22(1), 1-10 (1976)

    Article  MathSciNet  MATH  Google Scholar 

  50. V.A. Vaishampayan: Design of multiple description scalar quantizers, IEEE Trans. Inform. Theory IT-39(4), 821-834 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  51. V.A. Vaishampayan, J. Domaszewicz: Design of entropy-constrained multiple-description scalar quantizers, IEEE Trans. Inform. Theory IT-40(4), 245-250 (1994)

    Article  MATH  Google Scholar 

  52. N. Görtz, P. Leelapornchai: Optimization of the index assignments for multiple description vector quantizers, IEEE Trans. Commun. 51(3), 336-340 (2003)

    Article  Google Scholar 

  53. R.M. Gray: Source Coding Theory (Kluwer, Dordrecht 1990)

    MATH  Google Scholar 

  54. V.A. Vaishampayan, N.J.A. Sloane, S.D. Servetto: Multiple-description vector quantization with lattice codebooks: Design and analysis, IEEE Trans. Inform. Theory 47(1), 1718-1734 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  55. S.N. Diggavi, N. Sloane, V.A. Vaishampayan: Asymmetric multiple description lattice vector quantizers, IEEE Trans. Inform. Theory 48(1), 174-191 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  56. Y. Wang, M.T. Orchard, A.R. Reibman: Multiple description image coding for noisy channels by pairing transform coefficients, IEEE Workshop on Multimedia Signal Processing (1997) pp. 419-424

    Google Scholar 

  57. V.K. Goyal, J. Kovacevic: Generalized multiple description coding with correlating transforms, IEEE Trans. Inform. Theory 47(6), 2199-2224 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  58. T. Lookabough, R. Gray: High-resolution theory and the vector quantizer advantage, IEEE Trans. Inform. Theory IT-35(5), 1020-1033 (1989)

    Article  Google Scholar 

  59. Ajay Bakre: www.globalipsound.com/datasheets/ipcm-wb.pdf (2006)

    Google Scholar 

  60. J. Batllo, V.A. Vaishampayan: Asymptotic performance of multiple description transform codes, IEEE Trans. Inform. Theory 43(1), 703-707 (1997)

    Article  MATH  Google Scholar 

  61. D.W. Griffin, J.S. Lim: Signal estimation from modified short-time Fourier transform, IEEE Trans. Acoust. Speech Signal Process. 32, 236-243 (1984)

    Article  Google Scholar 

  62. S. Roucos, A. Wilgus: High quality time-scale modification for speech, Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (1985) pp. 493-496

    Google Scholar 

  63. W. Verhelst, M. Roelands: An overlap-add technique based on waveform similarity (WSOLA) for high quality time-scale modification of speech, Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (1993) pp. 554-557

    Google Scholar 

  64. H. Sanneck, A. Stenger, K. Ben Younes, B. Girod: A new technique for audio packet loss concealment, Proc. Global Telecomm. Conf GLOBECOM (1996) pp. 48-52

    Google Scholar 

  65. D.J. Goodman, G.B. Lockhart, O.J. Wasem, W.C. Wong: Waveform substitution techniques for recovering missing speech segments in packet voice communications, IEEE Trans. Acoust. Speech Signal Process. 34, 1440-1448 (1986)

    Article  Google Scholar 

  66. O.J. Wasem, D.J. Goodman, C.A. Dvorak, H.G. Page: The effect of waveform substitution on the quality of PCM packet communications, IEEE Trans. Acoust. Speech Signal Process. 36(3), 342-348 (1988)

    Article  Google Scholar 

  67. ITU-T: G.711 Appendix I: A high quality low-complexity algorithm for packet loss concealment with G.711 (1999)

    Google Scholar 

  68. E. Gündüzhan, K. Momtahan: A linear prediction based packet loss concealment algorithm for PCM coded speech, IEEE Trans. Acoust. Speech Signal Process. 9(8), 778-785 (2001)

    Article  Google Scholar 

  69. J. Lindblom, P. Hedelin: Packet loss concealment based on sinusoidal extrapolation, Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Vol. 1 (2002) pp. 173-176

    Google Scholar 

  70. K. Clüver, P. Noll: Reconstruction of missing speech frames using sub-band excitation, Int. Symp. Time-Frequency and Time-Scale Analysis (1996) pp. 277-280

    Google Scholar 

  71. G. Kubin: Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Vol. 1 (1996) pp. 267-270

    Google Scholar 

  72. C.A. Rodbro, M.N. Murthi, S.V. Andersen, S.H. Jensen: Hidden Markov Model-based packet loss concealment for voice over IP, IEEE Trans. Speech Audio Process. 14(5), 1609-1623 (2006)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Jan Skoglund Ph.D , Ermin Kozica M.Sc , Jan Linden Ph.D , Roar Hagen Dr. or W. Bastiaan Kleijn Prof. .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Skoglund, J., Kozica, E., Linden, J., Hagen, R., Kleijn, W. (2008). Voice over IP: Speech Transmission over Packet Networks. In: Benesty, J., Sondhi, M.M., Huang, Y.A. (eds) Springer Handbook of Speech Processing. Springer Handbooks. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-49127-9_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-49127-9_15

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-49125-5

  • Online ISBN: 978-3-540-49127-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics