Abstract
The emergence of packet networks for both data and voice traffic has introduced new challenges for speech transmission designs that differ significantly from those encountered and handled in traditional circuit-switched telephone networks, such as the public switched telephone network (PSTN). In this chapter, we present the many aspects that affect speech quality in a voice over IP (VoIP) conversation. We also present design techniques for coding systems that aim to overcome the deficiencies of the packet channel. By properly utilizing speech codecs tailored for packet networks, VoIP can in fact produce a quality higher than that possible with PSTN.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsAbbreviations
- ADPCM:
-
adaptive differential pulse code modulation
- AEC:
-
acoustic echo cancelation
- AGC:
-
automatic gain control
- ARQ:
-
automatic repeat request
- CELP:
-
code-excited linear prediction
- CNG:
-
comfort noise generation
- DPCM:
-
differential PCM
- FEC:
-
frame erasure concealment
- FIFO:
-
first-in first-out
- GSM:
-
Groupe Spéciale Mobile
- IETF:
-
Internet Engineering Task Force
- IP:
-
internet protocol
- ITU:
-
International Telecommunication Union
- LAN:
-
local-area network
- LPC:
-
linear prediction coefficients
- MDC:
-
multiple description coding
- MOS:
-
mean opinion score
- OLA:
-
overlap-and-add
- OSI:
-
open systems interconnection reference
- PCM:
-
pulse-code modulation
- PDA:
-
pitch determination algorithms
- PESQ:
-
perceptual evaluation of speech quality
- PLC:
-
packet loss concealment
- PSTN:
-
public switched telephone network
- QoS:
-
quality-of-service
- RS:
-
Reed-Solomon
- RSVP:
-
resource reservation protocol
- RTP:
-
real-time transport protocol
- SOLA:
-
synchronized overlap add
- TCP:
-
transmission control protocol
- UDP:
-
user datagram protocol
- VAD:
-
voice activity detector
- VoIP:
-
voice over IP
- WLAN:
-
wireless LAN
- WSOLA:
-
waveform similarity OLA
- WiFi:
-
wireless fidelity
- XOR:
-
exclusive-or
- iLBC:
-
internet low-bit-rate codec
References
L.R. Rabiner, R.W. Schafer: Digital Processing of Speech Signals (Prentice Hall, Englewood Cliffs 1978)
W. Stallings: High-Speed Networks: TCP/IP and ATM Design Principles (Prentice Hall, Englewood Cliffs 1998)
Information Sciences Institute: Transmission control protocol, IETF RFC793 (1981)
J. Postel: User datagram protocol, IETF RFC768 (1980)
H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson: RTP a transport protocol for real-time applications, IETF RFC3550 (2003)
ITU-T: G.131: Talker echo and its control (2003)
ITU-T: G.114: One-way transmission time (2003)
C.G. Davis: An experimental pulse code modulation system for short haul trunks, Bell Syst. Tech. J. 41, 25-97 (1962)
IEEE: 802.11: Part 11: Wireless LAN medium access control (MAC) and physical layer (PHY) specifications (2003)
IEEE: 802.15.1: Part 15.1: Wireless medium access control (MAC) and physical layer (PHY) specifications for wireless personal area networks (WPANs) (2005)
E. Dimitriou, P. Sörqvist: Internet telephony over WLANs, 2003 USTAs Telecom Eng. Conf. Supercomm (2003)
ITU-T: G.711: Pulse code modulation (PCM) of voice frequencies (1988)
IEEE: 802.1D Media access control (MAC) bridges (2004)
D. Grossman: New terminology and clarifications for diffserv, IETF RFC3260 (2002)
R. Braden, L. Zhang, S. Berson, S. Herzog, S. Jamin: Resource ReSerVation Protocol (RSVP) - Version 1 Functional specification, IETF RFC2205 (1997)
C. Aurrecoechea, A.T. Campbell, L. Hauw: A survey of QoS architectures, Multimedia Syst. 6(3), 138-151 (1998)
IEEE: 802.11e: Medium Access Control (MAC) Quality of Service (QoS) Enhancements (2005)
E. Hänsler, G. Schmidt: Acoustic Echo and Noise Control - A Practical Approach (Wiley, New York 2004)
ITU-T: G.729: Coding of speech at 8 kbit/s using conjugate-structure algebraic-code-excited linear prediction (CS-ACELP) (1996)
S. Andersen, A. Duric, H. Astrom, R. Hagen, W.B. Kleijn, J. Linden: Internet Low Bit Rate Codec (iLBC), IETF RFC3951 (2004)
Ajay Bakre: www.globalipsound.com/datasheets/isac.pdf (2006)
S.B. Moon, J.F. Kurose, D.F. Towsley: Packet audio playout delay adjustment: Performance bounds and algorithms, Multimedia Syst. 6(1), 17-28 (1998)
Ajay Bakre: www.globalipsound.com/datasheets/neteq.pdf (2006)
Y. Liang, N. Farber, B. Girod: Adaptive playout scheduling and loss concealment for voice communication over IP networks, IEEE Trans. Multimedia 5(4), 257-259 (2003)
F. Liu, J. Kim, C.-C.J. Kuo: Adaptive delay concealment for internet voice applications with packet-based time-scale modification, Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (2001)
ITU-T: P.800: Methods for subjective determination of transmission quality (1996)
S. Pennock: Accuracy of the perceptual evaluation of speech quality (PESQ) algorithm, Proc. Measurement of Speech and Audio Quality in Networks (2002)
M. Varela, I. Marsh, B. Grönvall: A systematic study of PESQs behavior (from a networking perspective), Proc. Measurement of Speech and Audio Quality in Networks (2006)
ITU-T: P.862: Perceptual evaluation of speech quality (PESQ), an objective method for end-to-end speech quality assessment of narrowband telephone networks and speech codecs (2001)
ITU-T: P.862.1: Mapping function for transforming P.862 raw result scores to MOS-LQO (2003)
C. Perkins, O. Hodson, V. Hardman: A survey of packet loss recovery techniques for streaming audio, IEEE Network 12, 40-48 (1998)
J. Rosenberg, H. Schulzrinne: An RTP payload format for generic forward error correction, IETF RFC2733 (1999)
J. Lacan, V. Roca, J. Peltotalo, S. Peltotalo: Reed-Solomon forward error correction (FEC), IETF (2007), work in progress
J. Rosenberg, L. Qiu, H. Schulzrinne: Integrating packet FEC into adaptive voice playout buffer algorithms on the internet, Proc. Conf. Comp. Comm. (IEEE INFOCOM 2000) (2000) pp. 1705-1714
W. Jiang, H. Schulzrinne: Comparison and optimization of packet loss repair methods on VoIP perceived quality under bursty loss, Proc. Int. Workshop on Network and Operating System Support for Digital Audio and Video (2002)
E. Martinian, C.-E.W. Sundberg: Burst erasure correction codes with low decoding delay, IEEE Trans. Inform. Theory 50(10), 2494-2502 (2004)
C. Perkins, I. Kouvelas, O. Hodson, V. Hardman, M. Handley, J. Bolot, A. Vega-Garcia, S. Fosse-Parisis: RTP payload format for redundant audio data, IETF RFC2198 (1997)
J.-C. Bolot, S. Fosse-Parisis, D. Towsley: Adaptive FEC-based error control for internet telephony, Proc. Conf. Comp. Comm. (IEEE INFOCOMM ʼ99) (IEEE, New York 1999) p. 1453-1460
T.M. Cover, J.A. Thomas: Elements of Information Theory (Wiley, New York 1991)
A.A.E. Gamal, T.M. Cover: Achievable rates for multiple descriptions, IEEE Trans. Inform. Theory IT-28(1), 851-857 (1982)
L. Ozarow: On a source coding prolem with two channels and three receivers, Bell Syst. Tech. J. 59, 1909-1921 (1980)
V.A. Vaishampayan, J. Batllo: Asymptotic analysis of multiple description quantizers, IEEE Trans. Inform. Theory 44(1), 278-284 (1998)
N.S. Jayant, S.W. Christensen: Effects of packet losses in waveform coded speech and improvements due to an odd-even sample-interpolation procedure, IEEE Trans. Commun. COM-29(2), 101-109 (1981)
N.S. Jayant: Subsampling of a DPCM speech channel to provide two self-contained half-rate channels, Bell Syst. Tech. J. 60(4), 501-509 (1981)
A. Ingle, V.A. Vaishampayan: DPCM system design for diversity systems with applications to packetized speech, IEEE Trans. Speech Audio Process. 3(1), 48-58 (1995)
A.O.W. Jiang: Multiple description speech coding for robust communication over lossy packet networks, IEEE Int. Conf. Multimedia and Expo (2000) pp. 444-447
V.K. Goyal: Multiple description coding: Compression meets the network, IEEE Signal Process. Mag. 18, 74-93 (2001)
A.D. Wyner: Recent results in the Shannon theory, IEEE Trans. Inform. Theory 20(1), 2-10 (1974)
A.D. Wyner, J. Ziv: The rate-distortion function for source coding with side information at the decoder, IEEE Trans. Inform. Theory 22(1), 1-10 (1976)
V.A. Vaishampayan: Design of multiple description scalar quantizers, IEEE Trans. Inform. Theory IT-39(4), 821-834 (1993)
V.A. Vaishampayan, J. Domaszewicz: Design of entropy-constrained multiple-description scalar quantizers, IEEE Trans. Inform. Theory IT-40(4), 245-250 (1994)
N. Görtz, P. Leelapornchai: Optimization of the index assignments for multiple description vector quantizers, IEEE Trans. Commun. 51(3), 336-340 (2003)
R.M. Gray: Source Coding Theory (Kluwer, Dordrecht 1990)
V.A. Vaishampayan, N.J.A. Sloane, S.D. Servetto: Multiple-description vector quantization with lattice codebooks: Design and analysis, IEEE Trans. Inform. Theory 47(1), 1718-1734 (2001)
S.N. Diggavi, N. Sloane, V.A. Vaishampayan: Asymmetric multiple description lattice vector quantizers, IEEE Trans. Inform. Theory 48(1), 174-191 (2002)
Y. Wang, M.T. Orchard, A.R. Reibman: Multiple description image coding for noisy channels by pairing transform coefficients, IEEE Workshop on Multimedia Signal Processing (1997) pp. 419-424
V.K. Goyal, J. Kovacevic: Generalized multiple description coding with correlating transforms, IEEE Trans. Inform. Theory 47(6), 2199-2224 (2001)
T. Lookabough, R. Gray: High-resolution theory and the vector quantizer advantage, IEEE Trans. Inform. Theory IT-35(5), 1020-1033 (1989)
Ajay Bakre: www.globalipsound.com/datasheets/ipcm-wb.pdf (2006)
J. Batllo, V.A. Vaishampayan: Asymptotic performance of multiple description transform codes, IEEE Trans. Inform. Theory 43(1), 703-707 (1997)
D.W. Griffin, J.S. Lim: Signal estimation from modified short-time Fourier transform, IEEE Trans. Acoust. Speech Signal Process. 32, 236-243 (1984)
S. Roucos, A. Wilgus: High quality time-scale modification for speech, Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (1985) pp. 493-496
W. Verhelst, M. Roelands: An overlap-add technique based on waveform similarity (WSOLA) for high quality time-scale modification of speech, Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (1993) pp. 554-557
H. Sanneck, A. Stenger, K. Ben Younes, B. Girod: A new technique for audio packet loss concealment, Proc. Global Telecomm. Conf GLOBECOM (1996) pp. 48-52
D.J. Goodman, G.B. Lockhart, O.J. Wasem, W.C. Wong: Waveform substitution techniques for recovering missing speech segments in packet voice communications, IEEE Trans. Acoust. Speech Signal Process. 34, 1440-1448 (1986)
O.J. Wasem, D.J. Goodman, C.A. Dvorak, H.G. Page: The effect of waveform substitution on the quality of PCM packet communications, IEEE Trans. Acoust. Speech Signal Process. 36(3), 342-348 (1988)
ITU-T: G.711 Appendix I: A high quality low-complexity algorithm for packet loss concealment with G.711 (1999)
E. Gündüzhan, K. Momtahan: A linear prediction based packet loss concealment algorithm for PCM coded speech, IEEE Trans. Acoust. Speech Signal Process. 9(8), 778-785 (2001)
J. Lindblom, P. Hedelin: Packet loss concealment based on sinusoidal extrapolation, Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Vol. 1 (2002) pp. 173-176
K. Clüver, P. Noll: Reconstruction of missing speech frames using sub-band excitation, Int. Symp. Time-Frequency and Time-Scale Analysis (1996) pp. 277-280
G. Kubin: Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Vol. 1 (1996) pp. 267-270
C.A. Rodbro, M.N. Murthi, S.V. Andersen, S.H. Jensen: Hidden Markov Model-based packet loss concealment for voice over IP, IEEE Trans. Speech Audio Process. 14(5), 1609-1623 (2006)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Skoglund, J., Kozica, E., Linden, J., Hagen, R., Kleijn, W. (2008). Voice over IP: Speech Transmission over Packet Networks. In: Benesty, J., Sondhi, M.M., Huang, Y.A. (eds) Springer Handbook of Speech Processing. Springer Handbooks. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-49127-9_15
Download citation
DOI: https://doi.org/10.1007/978-3-540-49127-9_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49125-5
Online ISBN: 978-3-540-49127-9
eBook Packages: EngineeringEngineering (R0)