Voice over IP: Speech Transmission over Packet Networks

Skoglund, Jan; Kozica, Ermin; Linden, Jan; Hagen, Roar; Kleijn, W. Bastiaan

doi:10.1007/978-3-540-49127-9_15

Voice over IP: Speech Transmission over Packet Networks

Jan Skoglund Ph.D⁴,
Ermin Kozica M.Sc⁵,
Jan Linden Ph.D⁶,
Roar Hagen Dr.⁷ &
…
W. Bastiaan Kleijn Prof.⁸

Chapter

7980 Accesses
3 Citations

Part of the book series: Springer Handbooks ((SHB))

Abstract

The emergence of packet networks for both data and voice traffic has introduced new challenges for speech transmission designs that differ significantly from those encountered and handled in traditional circuit-switched telephone networks, such as the public switched telephone network (PSTN). In this chapter, we present the many aspects that affect speech quality in a voice over IP (VoIP) conversation. We also present design techniques for coding systems that aim to overcome the deficiencies of the packet channel. By properly utilizing speech codecs tailored for packet networks, VoIP can in fact produce a quality higher than that possible with PSTN.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 579.00; Price excludes VAT (USA)

Hardcover Book: USD 729.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Abbreviations

ADPCM:: adaptive differential pulse code modulation
AEC:: acoustic echo cancelation
AGC:: automatic gain control
ARQ:: automatic repeat request
CELP:: code-excited linear prediction
CNG:: comfort noise generation
DPCM:: differential PCM
FEC:: frame erasure concealment
FIFO:: first-in first-out
GSM:: Groupe Spéciale Mobile
IETF:: Internet Engineering Task Force
IP:: internet protocol
ITU:: International Telecommunication Union
LAN:: local-area network
LPC:: linear prediction coefficients
MDC:: multiple description coding
MOS:: mean opinion score
OLA:: overlap-and-add
OSI:: open systems interconnection reference
PCM:: pulse-code modulation
PDA:: pitch determination algorithms
PESQ:: perceptual evaluation of speech quality
PLC:: packet loss concealment
PSTN:: public switched telephone network
QoS:: quality-of-service
RS:: Reed-Solomon
RSVP:: resource reservation protocol
RTP:: real-time transport protocol
SOLA:: synchronized overlap add
TCP:: transmission control protocol
UDP:: user datagram protocol
VAD:: voice activity detector
VoIP:: voice over IP
WLAN:: wireless LAN
WSOLA:: waveform similarity OLA
WiFi:: wireless fidelity
XOR:: exclusive-or
iLBC:: internet low-bit-rate codec

References

L.R. Rabiner, R.W. Schafer: Digital Processing of Speech Signals (Prentice Hall, Englewood Cliffs 1978)
Google Scholar
W. Stallings: High-Speed Networks: TCP/IP and ATM Design Principles (Prentice Hall, Englewood Cliffs 1998)
Google Scholar
Information Sciences Institute: Transmission control protocol, IETF RFC793 (1981)
Google Scholar
J. Postel: User datagram protocol, IETF RFC768 (1980)
Google Scholar
H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson: RTP a transport protocol for real-time applications, IETF RFC3550 (2003)
Google Scholar
ITU-T: G.131: Talker echo and its control (2003)
Google Scholar
ITU-T: G.114: One-way transmission time (2003)
Google Scholar
C.G. Davis: An experimental pulse code modulation system for short haul trunks, Bell Syst. Tech. J. 41, 25-97 (1962)
Article Google Scholar
IEEE: 802.11: Part 11: Wireless LAN medium access control (MAC) and physical layer (PHY) specifications (2003)
Google Scholar
IEEE: 802.15.1: Part 15.1: Wireless medium access control (MAC) and physical layer (PHY) specifications for wireless personal area networks (WPANs) (2005)
Google Scholar
E. Dimitriou, P. Sörqvist: Internet telephony over WLANs, 2003 USTAs Telecom Eng. Conf. Supercomm (2003)
Google Scholar
ITU-T: G.711: Pulse code modulation (PCM) of voice frequencies (1988)
Google Scholar
IEEE: 802.1D Media access control (MAC) bridges (2004)
Google Scholar
D. Grossman: New terminology and clarifications for diffserv, IETF RFC3260 (2002)
Google Scholar
R. Braden, L. Zhang, S. Berson, S. Herzog, S. Jamin: Resource ReSerVation Protocol (RSVP) - Version 1 Functional specification, IETF RFC2205 (1997)
Google Scholar
C. Aurrecoechea, A.T. Campbell, L. Hauw: A survey of QoS architectures, Multimedia Syst. 6(3), 138-151 (1998)
Article Google Scholar
IEEE: 802.11e: Medium Access Control (MAC) Quality of Service (QoS) Enhancements (2005)
Google Scholar
E. Hänsler, G. Schmidt: Acoustic Echo and Noise Control - A Practical Approach (Wiley, New York 2004)
Book Google Scholar
ITU-T: G.729: Coding of speech at 8 kbit/s using conjugate-structure algebraic-code-excited linear prediction (CS-ACELP) (1996)
Google Scholar
S. Andersen, A. Duric, H. Astrom, R. Hagen, W.B. Kleijn, J. Linden: Internet Low Bit Rate Codec (iLBC), IETF RFC3951 (2004)
Google Scholar
Ajay Bakre: www.globalipsound.com/datasheets/isac.pdf (2006)
Google Scholar
S.B. Moon, J.F. Kurose, D.F. Towsley: Packet audio playout delay adjustment: Performance bounds and algorithms, Multimedia Syst. 6(1), 17-28 (1998)
Article Google Scholar
Ajay Bakre: www.globalipsound.com/datasheets/neteq.pdf (2006)
Google Scholar
Y. Liang, N. Farber, B. Girod: Adaptive playout scheduling and loss concealment for voice communication over IP networks, IEEE Trans. Multimedia 5(4), 257-259 (2003)
Article Google Scholar
F. Liu, J. Kim, C.-C.J. Kuo: Adaptive delay concealment for internet voice applications with packet-based time-scale modification, Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (2001)
Google Scholar
ITU-T: P.800: Methods for subjective determination of transmission quality (1996)
Google Scholar
S. Pennock: Accuracy of the perceptual evaluation of speech quality (PESQ) algorithm, Proc. Measurement of Speech and Audio Quality in Networks (2002)
Google Scholar
M. Varela, I. Marsh, B. Grönvall: A systematic study of PESQs behavior (from a networking perspective), Proc. Measurement of Speech and Audio Quality in Networks (2006)
Google Scholar
ITU-T: P.862: Perceptual evaluation of speech quality (PESQ), an objective method for end-to-end speech quality assessment of narrowband telephone networks and speech codecs (2001)
Google Scholar
ITU-T: P.862.1: Mapping function for transforming P.862 raw result scores to MOS-LQO (2003)
Google Scholar
C. Perkins, O. Hodson, V. Hardman: A survey of packet loss recovery techniques for streaming audio, IEEE Network 12, 40-48 (1998)
Article Google Scholar
J. Rosenberg, H. Schulzrinne: An RTP payload format for generic forward error correction, IETF RFC2733 (1999)
Google Scholar
J. Lacan, V. Roca, J. Peltotalo, S. Peltotalo: Reed-Solomon forward error correction (FEC), IETF (2007), work in progress
Google Scholar
J. Rosenberg, L. Qiu, H. Schulzrinne: Integrating packet FEC into adaptive voice playout buffer algorithms on the internet, Proc. Conf. Comp. Comm. (IEEE INFOCOM 2000) (2000) pp. 1705-1714
Google Scholar
W. Jiang, H. Schulzrinne: Comparison and optimization of packet loss repair methods on VoIP perceived quality under bursty loss, Proc. Int. Workshop on Network and Operating System Support for Digital Audio and Video (2002)
Google Scholar
E. Martinian, C.-E.W. Sundberg: Burst erasure correction codes with low decoding delay, IEEE Trans. Inform. Theory 50(10), 2494-2502 (2004)
Article MathSciNet MATH Google Scholar
C. Perkins, I. Kouvelas, O. Hodson, V. Hardman, M. Handley, J. Bolot, A. Vega-Garcia, S. Fosse-Parisis: RTP payload format for redundant audio data, IETF RFC2198 (1997)
Google Scholar
J.-C. Bolot, S. Fosse-Parisis, D. Towsley: Adaptive FEC-based error control for internet telephony, Proc. Conf. Comp. Comm. (IEEE INFOCOMM ʼ99) (IEEE, New York 1999) p. 1453-1460
Google Scholar
T.M. Cover, J.A. Thomas: Elements of Information Theory (Wiley, New York 1991)
Book MATH Google Scholar
A.A.E. Gamal, T.M. Cover: Achievable rates for multiple descriptions, IEEE Trans. Inform. Theory IT-28(1), 851-857 (1982)
Article MathSciNet MATH Google Scholar
L. Ozarow: On a source coding prolem with two channels and three receivers, Bell Syst. Tech. J. 59, 1909-1921 (1980)
Article MathSciNet MATH Google Scholar
V.A. Vaishampayan, J. Batllo: Asymptotic analysis of multiple description quantizers, IEEE Trans. Inform. Theory 44(1), 278-284 (1998)
Article MathSciNet MATH Google Scholar
N.S. Jayant, S.W. Christensen: Effects of packet losses in waveform coded speech and improvements due to an odd-even sample-interpolation procedure, IEEE Trans. Commun. COM-29(2), 101-109 (1981)
Article Google Scholar
N.S. Jayant: Subsampling of a DPCM speech channel to provide two self-contained half-rate channels, Bell Syst. Tech. J. 60(4), 501-509 (1981)
Article Google Scholar
A. Ingle, V.A. Vaishampayan: DPCM system design for diversity systems with applications to packetized speech, IEEE Trans. Speech Audio Process. 3(1), 48-58 (1995)
Article Google Scholar
A.O.W. Jiang: Multiple description speech coding for robust communication over lossy packet networks, IEEE Int. Conf. Multimedia and Expo (2000) pp. 444-447
Google Scholar
V.K. Goyal: Multiple description coding: Compression meets the network, IEEE Signal Process. Mag. 18, 74-93 (2001)
Article Google Scholar
A.D. Wyner: Recent results in the Shannon theory, IEEE Trans. Inform. Theory 20(1), 2-10 (1974)
Article MathSciNet MATH Google Scholar
A.D. Wyner, J. Ziv: The rate-distortion function for source coding with side information at the decoder, IEEE Trans. Inform. Theory 22(1), 1-10 (1976)
Article MathSciNet MATH Google Scholar
V.A. Vaishampayan: Design of multiple description scalar quantizers, IEEE Trans. Inform. Theory IT-39(4), 821-834 (1993)
Article MathSciNet MATH Google Scholar
V.A. Vaishampayan, J. Domaszewicz: Design of entropy-constrained multiple-description scalar quantizers, IEEE Trans. Inform. Theory IT-40(4), 245-250 (1994)
Article MATH Google Scholar
N. Görtz, P. Leelapornchai: Optimization of the index assignments for multiple description vector quantizers, IEEE Trans. Commun. 51(3), 336-340 (2003)
Article Google Scholar
R.M. Gray: Source Coding Theory (Kluwer, Dordrecht 1990)
MATH Google Scholar
V.A. Vaishampayan, N.J.A. Sloane, S.D. Servetto: Multiple-description vector quantization with lattice codebooks: Design and analysis, IEEE Trans. Inform. Theory 47(1), 1718-1734 (2001)
Article MathSciNet MATH Google Scholar
S.N. Diggavi, N. Sloane, V.A. Vaishampayan: Asymmetric multiple description lattice vector quantizers, IEEE Trans. Inform. Theory 48(1), 174-191 (2002)
Article MathSciNet MATH Google Scholar
Y. Wang, M.T. Orchard, A.R. Reibman: Multiple description image coding for noisy channels by pairing transform coefficients, IEEE Workshop on Multimedia Signal Processing (1997) pp. 419-424
Google Scholar
V.K. Goyal, J. Kovacevic: Generalized multiple description coding with correlating transforms, IEEE Trans. Inform. Theory 47(6), 2199-2224 (2001)
Article MathSciNet MATH Google Scholar
T. Lookabough, R. Gray: High-resolution theory and the vector quantizer advantage, IEEE Trans. Inform. Theory IT-35(5), 1020-1033 (1989)
Article Google Scholar
Ajay Bakre: www.globalipsound.com/datasheets/ipcm-wb.pdf (2006)
Google Scholar
J. Batllo, V.A. Vaishampayan: Asymptotic performance of multiple description transform codes, IEEE Trans. Inform. Theory 43(1), 703-707 (1997)
Article MATH Google Scholar
D.W. Griffin, J.S. Lim: Signal estimation from modified short-time Fourier transform, IEEE Trans. Acoust. Speech Signal Process. 32, 236-243 (1984)
Article Google Scholar
S. Roucos, A. Wilgus: High quality time-scale modification for speech, Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (1985) pp. 493-496
Google Scholar
W. Verhelst, M. Roelands: An overlap-add technique based on waveform similarity (WSOLA) for high quality time-scale modification of speech, Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (1993) pp. 554-557
Google Scholar
H. Sanneck, A. Stenger, K. Ben Younes, B. Girod: A new technique for audio packet loss concealment, Proc. Global Telecomm. Conf GLOBECOM (1996) pp. 48-52
Google Scholar
D.J. Goodman, G.B. Lockhart, O.J. Wasem, W.C. Wong: Waveform substitution techniques for recovering missing speech segments in packet voice communications, IEEE Trans. Acoust. Speech Signal Process. 34, 1440-1448 (1986)
Article Google Scholar
O.J. Wasem, D.J. Goodman, C.A. Dvorak, H.G. Page: The effect of waveform substitution on the quality of PCM packet communications, IEEE Trans. Acoust. Speech Signal Process. 36(3), 342-348 (1988)
Article Google Scholar
ITU-T: G.711 Appendix I: A high quality low-complexity algorithm for packet loss concealment with G.711 (1999)
Google Scholar
E. Gündüzhan, K. Momtahan: A linear prediction based packet loss concealment algorithm for PCM coded speech, IEEE Trans. Acoust. Speech Signal Process. 9(8), 778-785 (2001)
Article Google Scholar
J. Lindblom, P. Hedelin: Packet loss concealment based on sinusoidal extrapolation, Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Vol. 1 (2002) pp. 173-176
Google Scholar
K. Clüver, P. Noll: Reconstruction of missing speech frames using sub-band excitation, Int. Symp. Time-Frequency and Time-Scale Analysis (1996) pp. 277-280
Google Scholar
G. Kubin: Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Vol. 1 (1996) pp. 267-270
Google Scholar
C.A. Rodbro, M.N. Murthi, S.V. Andersen, S.H. Jensen: Hidden Markov Model-based packet loss concealment for voice over IP, IEEE Trans. Speech Audio Process. 14(5), 1609-1623 (2006)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Global IP Solutions, 301 Brannan Street, 94107, San Francisco, CA, USA
Jan Skoglund Ph.D
School of Electrical Engineering, Sound and Image Processing Laboratory, Royal Institute of Technology (KTH), Osquldas väg 10, 10044, Stockholm, Sweden
Ermin Kozica M.Sc
Global IP Solutions, 301 Brannan Street, 94107, San Francisco, CA, USA
Jan Linden Ph.D
Global IP Solutions, Magnus Ladulsgatan 63B, 118 27, Stockholm, Sweden
Roar Hagen Dr.
School of Electrical Engineering, Sound and Image Processing Lab, Royal Institute of Technology (KTH), Osquldas väg 10, 10044, Stockholm, Sweden
W. Bastiaan Kleijn Prof.

Authors

Jan Skoglund Ph.D
View author publications
You can also search for this author in PubMed Google Scholar
Ermin Kozica M.Sc
View author publications
You can also search for this author in PubMed Google Scholar
Jan Linden Ph.D
View author publications
You can also search for this author in PubMed Google Scholar
Roar Hagen Dr.
View author publications
You can also search for this author in PubMed Google Scholar
W. Bastiaan Kleijn Prof.
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Jan Skoglund Ph.D , Ermin Kozica M.Sc , Jan Linden Ph.D , Roar Hagen Dr. or W. Bastiaan Kleijn Prof. .

Editor information

Editors and Affiliations

INRS-EMT, University of Quebec, 800 de la Gauchetiere Ouest, H5A 1K6, Montreal, Quebec, Canada
Jacob Benesty Dr.
Avayalabs Research, 233 Mount Airy Road, 07920, Basking Ridge, NJ, USA
M. Mohan Sondhi Ph.D.
Alcatel-Lucent, Bell Laboratories, 600 Mountain Avenue, 07974, Murray Hill, NJ, USA
Yiteng Arden Huang Dr.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Skoglund, J., Kozica, E., Linden, J., Hagen, R., Kleijn, W. (2008). Voice over IP: Speech Transmission over Packet Networks. In: Benesty, J., Sondhi, M.M., Huang, Y.A. (eds) Springer Handbook of Speech Processing. Springer Handbooks. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-49127-9_15

Download citation

DOI: https://doi.org/10.1007/978-3-540-49127-9_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49125-5
Online ISBN: 978-3-540-49127-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics