Summary
The historic “coding gap” between high-rate coding of narrow- and wide-band speech on the one hand, and low-rate coding of narrow-band speech on the other hand, has been bridged more and more during the past 15 years. The GSM coder of 1990 was a very important milestone in this process, as it has prompted increasing research towards better quality and higher compression. These particular research efforts, together with other relevant activities worldwide, have helped closing the gap. In the following, the concepts behind this progress will be explained. A special focus will be put on the basis of this success, namely, on the fact that, finally, a break-through could be achieved for narrow-band speech, allowing for good quality at medium-to-low rates. For wide-band speech this holds true at medium rates. The same concepts as applied to speech coding were also applied to music, yet, ending up with some noticeable conceptual differences. While for speech time-domain approaches prevail, frequency-domain coding turned out to be more successful for audio. A characteristic, there, is extensive exploitation of psycho-acoustical phenomena.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ambikairajah E, Eps J, Lin L (2001) Wideband-speech and audio coding using gammatone filterbanks. Proc. IEEE Int Conf Acoustics, Speech, Sign Processing, ICASSP, Salt Lake City, II:773–776
Bell C G, Fujisaki H, Heinz J M, Stevens N K, House A S (1961) Reduction of speech spectra by analysis-by-synthesis techniques. J Acoust Soc Am. 33:1725–1736
Bessette B, Lefebvre R, Salami R, Jelinek M, Vainio J, Rotola-Pukkila J, Järvinen K, (2001) Techniques for high-quality ACELP coding of wide-band speech. Proc EUROSPEECH, Aalborg, 3:1993–1996
Brandenburg K, Johnston J D (1990) Second-generation audio coding: the hybrid coder. Proc 88th Conv Audio Engr Soc, AES, preprint 2937
Brehm H, Stammler W (1987) Description and generation of Spherically-Invariant Speech-Model Signals. EURASIP J Sign Process 12:119–141
Carl H (1994) Examination of different speech-coding methods and an application to band-width enhancement for narrow-band speech signals (in German). Doct Diss, Ruhr-University, Bochum.
Carl H, Heute U (1994) Bandwidth enhancement of narrow-band speech signals. Proc Europ Sig Process Conf, EUSIPCO, Edinburgh, 1716–1719
Ehnert W, Heute U. (1997) Variable-rate speech coding: replacing unvoiced excitations by linear-prediction residues of different phonemes. Proc GRETSI, Grenoble, 993–996
Erdmann Ch, Vary P, et al. (2001) A candidate proposal for a 3-GPP adaptive multi-rate wide-band speech codec. Proc Int Conf Acoustics, Speech, Sig Process, ICASSP, Salt Lake City, II:757–760
ETSI (1989) Recommendation GSM 06.10: GSM full-rate transcoding”
ETSI (1994) Recommendation GSM 06.20: European digital cellular telecommunications system speech codec for the half-rate speech-traffic channel
ETSI (1996) Recommendation GSM 06.60: Digital cellular telecommunications system: enhanced full-rate (EFR) speech transcoding
ETSI (1999) Recommendation AMR 06.90: Digital cellular telecommunications system: adaptive multi-rate (AMR) speech transmission
Fastl H (2005) Psychoacoustics and sound quality. Chap 6 this vol
Gluth R, Guendel L, Heute U (1986) ATC — a candidate for digital mobile-radio telephony. Proc Nordic Sem Dig Land-Mob Radio Comm, Stockholm, 230–235
Gray R M (1984) Vector quantization. IEEE ASSP Mag 1:4–29
Grill B, Edler B, et al. (1998) Information technology — very low bitrate audiovisual coding. Part 3: Audio. Subpart 1: Main Document. ISO/IEC FCD 14496-3 Subpart 1
Halle M, Stevens N K (1959) Analysis-by-synthesis. In: Wathen-Dunn W, Woods L (eds) AFCR-TR-59-198, Proc Sem Speech Compress Processg II:D7
Hammershøi D, Møller H (2005) Binaural technique: basic methods for recording, synthesis and reproduction. Chap 9 this vol
Hess W (1983) Pitch determination of speech signals. Springer, Berlin
Heute U (1988) Medium-rate speech coding — trial of a review. Speech Commm 7:125–149
Heute U (1994) Speech coding: approaches, trends, standards (in German). Proc ITG Conf Source & Channel Cod, Munich, 437–448
Hudde H (2005) A functional view on the human peripheral hearing system. Chap 2 this vol
Huffman D A (1952) A method for the construction of minimum-redundancy codes. Proc IRE 40:1098–1101
Itakura F (1975) Line-spectral representation of linear-prediction coefficients of speech signals. J. Acoust Soc Am 57:S37
ITU/CCITT (1988) Recommendation G.711: Coding of analogue signals by pulse-code modulation (PCM) of voice-frequencies
ITU-T (1988) Recommendation G.722: 7 kHz Audio Coding within 64 kbit/s. Fascicle III.4, Blue Book 269–341
ITU-T (1992) Recommendation G.726: 40, 32, 24, 16 kbit/s adaptive differential pulse-code modulation (ADPCM)
ITU-T (1992) Recommendation G.728: Coding of speech at 16 kbit/s, using low-delay code-excited linear prediction
ITU-T (1995) Recommendation G.729: Coding of speech at 8 kbit/s, using conjugate-structure algebraic code-excited linear prediction (CS-ACELP)
Jayant N S, Noll P (1984) Digital coding of waveforms. Prentice Hall, Englewood Cliffs
Kahrs M, Brandenburg K (1998) Application of digital signal processing to audio and acoustics. Kluwer Acad Publ
Kleijn B, Haagen J (1995) A speech coder based on decomposition of characteristic waveforms. Proc Int Conf Acoustics, Speech, Sig Process, ICASSP, Detroit, 508–511
Lacroix A (2005) Speech-production: acoustics, models, and applications. Chap 13 this vol
Lazzari V, Montagna R, Sereno D (1988) Comparison of two speech codecs for DMR systems. Speech Comm 7:193–207
Linde Y, Buzo A, Gray R M (1980) An algorithm for vector-quantizer design. IEEE Transact Comm 28:84–95
Max J (1960) Quantizing for minimum distortion. IRE Transact. Inf Th 6:7–12
Möller S (2005) Quality of transmitted speech for humans and machines. Chap 7 this vol
Mourjopoulos J (2005) The evolution of digital audio technology. Chap 12 this vol.
Noll P, Zelinski R (1977) Adaptive transform coding of speech signals. IEEE Transact Acoustics Speech Sig Process 25:299–309
Noll P (1993) Wideband-speech and audio coding. IEEE Comm Mag Nov:34-44
Noll P (1995) Digital audio coding for visual communications. Proc IEEE 83:925–943
Noll P (1997) MPEG digital audio coding. IEEE Sig Process Mag Sept:59–81
O'shaughnessy D (1987) Speech communication-human and machine. Addison-Wesley, New York
Painter T, Spanias A (2000) Perceptual coding of digital audio. Proc IEEE 88:451–513
Painter T, Spanias A (2003) Sinusoidal analysis-synthesis of audio using perceptual criteria. EURASIP J Appl Sig Processg 15–20.
Paulus J (1997) Coding of wide-band speech signals at low data rates (in German). Doct Diss, RWTH, Aachen
Rabiner L R, Schafer R W (1978) Digital processing of speech signals. Prentice-Hall, Englewood Cliffs, N.J.
Spanias A S (1994) Speech coding: a tutorial review. Proc IEEE 82:1541–1582
Gao Y, Benyassine A, Thyssen J, Su H, Shlomot E (2001) EX-CELP: a speechcoding paradigm. Proc Int Conf Acoustics, Speech Sig Process, ICASSP, Salt Lake City, II:689–692
Thyssen J, Gao Y, Benyassine A, Shlomot E, Murgia C, Su H, Mano K, Hiwasaki Y, Ehara A, Yasunaga K, Lamblin C, Kovesi B, Stegmann J, Kang H (2001) A candidate for the ITU-T 4 kbit/s speech-coding standard. Proc Int Conf Acoustics, Speech Sig Process, ICASSP, Salt Lake City, II:681–684
Trancoso I M, Almeida L, Rodriges J, Marques J, Tribolet J (1988) Harmonic coding-state of the art and future trends. Speech Comm 7:239–245
Tremain T E (1982) The government standard linear-predictive coding: LPC-10. Speech Technol 1:40–49
Varga I (2001) Standardization of the adaptive multi-rate wideband codec. Proc ITG Conf Source & Channel Cod, Berlin, 341–346
Vary P, Heute U, Hess W (1998) Digitale Sprachsignalverarbeitung. Teubner, Stuttgart
Zwicker E, Fastl H (1990) Psychoacoustics-facts and models. Springer, Berlin Heidelberg
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Heute, U. (2005). Speech and Audio Coding — Aiming at High Quality and Low Data Rates. In: Blauert, J. (eds) Communication Acoustics. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-27437-5_14
Download citation
DOI: https://doi.org/10.1007/3-540-27437-5_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22162-3
Online ISBN: 978-3-540-27437-7
eBook Packages: EngineeringEngineering (R0)