Speech and Audio Coding — Aiming at High Quality and Low Data Rates

Heute, Ulrich

doi:10.1007/3-540-27437-5_14

Ulrich Heute²

2553 Accesses
2 Citations

Summary

The historic “coding gap” between high-rate coding of narrow- and wide-band speech on the one hand, and low-rate coding of narrow-band speech on the other hand, has been bridged more and more during the past 15 years. The GSM coder of 1990 was a very important milestone in this process, as it has prompted increasing research towards better quality and higher compression. These particular research efforts, together with other relevant activities worldwide, have helped closing the gap. In the following, the concepts behind this progress will be explained. A special focus will be put on the basis of this success, namely, on the fact that, finally, a break-through could be achieved for narrow-band speech, allowing for good quality at medium-to-low rates. For wide-band speech this holds true at medium rates. The same concepts as applied to speech coding were also applied to music, yet, ending up with some noticeable conceptual differences. While for speech time-domain approaches prevail, frequency-domain coding turned out to be more successful for audio. A characteristic, there, is extensive exploitation of psycho-acoustical phenomena.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ambikairajah E, Eps J, Lin L (2001) Wideband-speech and audio coding using gammatone filterbanks. Proc. IEEE Int Conf Acoustics, Speech, Sign Processing, ICASSP, Salt Lake City, II:773–776
Google Scholar
Bell C G, Fujisaki H, Heinz J M, Stevens N K, House A S (1961) Reduction of speech spectra by analysis-by-synthesis techniques. J Acoust Soc Am. 33:1725–1736
Article Google Scholar
Bessette B, Lefebvre R, Salami R, Jelinek M, Vainio J, Rotola-Pukkila J, Järvinen K, (2001) Techniques for high-quality ACELP coding of wide-band speech. Proc EUROSPEECH, Aalborg, 3:1993–1996
Google Scholar
Brandenburg K, Johnston J D (1990) Second-generation audio coding: the hybrid coder. Proc 88^th Conv Audio Engr Soc, AES, preprint 2937
Google Scholar
Brehm H, Stammler W (1987) Description and generation of Spherically-Invariant Speech-Model Signals. EURASIP J Sign Process 12:119–141
Google Scholar
Carl H (1994) Examination of different speech-coding methods and an application to band-width enhancement for narrow-band speech signals (in German). Doct Diss, Ruhr-University, Bochum.
Google Scholar
Carl H, Heute U (1994) Bandwidth enhancement of narrow-band speech signals. Proc Europ Sig Process Conf, EUSIPCO, Edinburgh, 1716–1719
Google Scholar
Ehnert W, Heute U. (1997) Variable-rate speech coding: replacing unvoiced excitations by linear-prediction residues of different phonemes. Proc GRETSI, Grenoble, 993–996
Google Scholar
Erdmann Ch, Vary P, et al. (2001) A candidate proposal for a 3-GPP adaptive multi-rate wide-band speech codec. Proc Int Conf Acoustics, Speech, Sig Process, ICASSP, Salt Lake City, II:757–760
Google Scholar
ETSI (1989) Recommendation GSM 06.10: GSM full-rate transcoding”
Google Scholar
ETSI (1994) Recommendation GSM 06.20: European digital cellular telecommunications system speech codec for the half-rate speech-traffic channel
Google Scholar
ETSI (1996) Recommendation GSM 06.60: Digital cellular telecommunications system: enhanced full-rate (EFR) speech transcoding
Google Scholar
ETSI (1999) Recommendation AMR 06.90: Digital cellular telecommunications system: adaptive multi-rate (AMR) speech transmission
Google Scholar
Fastl H (2005) Psychoacoustics and sound quality. Chap 6 this vol
Google Scholar
Gluth R, Guendel L, Heute U (1986) ATC — a candidate for digital mobile-radio telephony. Proc Nordic Sem Dig Land-Mob Radio Comm, Stockholm, 230–235
Google Scholar
Gray R M (1984) Vector quantization. IEEE ASSP Mag 1:4–29
Article Google Scholar
Grill B, Edler B, et al. (1998) Information technology — very low bitrate audiovisual coding. Part 3: Audio. Subpart 1: Main Document. ISO/IEC FCD 14496-3 Subpart 1
Google Scholar
Halle M, Stevens N K (1959) Analysis-by-synthesis. In: Wathen-Dunn W, Woods L (eds) AFCR-TR-59-198, Proc Sem Speech Compress Processg II:D7
Google Scholar
Hammershøi D, Møller H (2005) Binaural technique: basic methods for recording, synthesis and reproduction. Chap 9 this vol
Google Scholar
Hess W (1983) Pitch determination of speech signals. Springer, Berlin
Google Scholar
Heute U (1988) Medium-rate speech coding — trial of a review. Speech Commm 7:125–149
Article Google Scholar
Heute U (1994) Speech coding: approaches, trends, standards (in German). Proc ITG Conf Source & Channel Cod, Munich, 437–448
Google Scholar
Hudde H (2005) A functional view on the human peripheral hearing system. Chap 2 this vol
Google Scholar
Huffman D A (1952) A method for the construction of minimum-redundancy codes. Proc IRE 40:1098–1101
Google Scholar
Itakura F (1975) Line-spectral representation of linear-prediction coefficients of speech signals. J. Acoust Soc Am 57:S37
Article Google Scholar
ITU/CCITT (1988) Recommendation G.711: Coding of analogue signals by pulse-code modulation (PCM) of voice-frequencies
Google Scholar
ITU-T (1988) Recommendation G.722: 7 kHz Audio Coding within 64 kbit/s. Fascicle III.4, Blue Book 269–341
Google Scholar
ITU-T (1992) Recommendation G.726: 40, 32, 24, 16 kbit/s adaptive differential pulse-code modulation (ADPCM)
Google Scholar
ITU-T (1992) Recommendation G.728: Coding of speech at 16 kbit/s, using low-delay code-excited linear prediction
Google Scholar
ITU-T (1995) Recommendation G.729: Coding of speech at 8 kbit/s, using conjugate-structure algebraic code-excited linear prediction (CS-ACELP)
Google Scholar
Jayant N S, Noll P (1984) Digital coding of waveforms. Prentice Hall, Englewood Cliffs
Google Scholar
Kahrs M, Brandenburg K (1998) Application of digital signal processing to audio and acoustics. Kluwer Acad Publ
Google Scholar
Kleijn B, Haagen J (1995) A speech coder based on decomposition of characteristic waveforms. Proc Int Conf Acoustics, Speech, Sig Process, ICASSP, Detroit, 508–511
Google Scholar
Lacroix A (2005) Speech-production: acoustics, models, and applications. Chap 13 this vol
Google Scholar
Lazzari V, Montagna R, Sereno D (1988) Comparison of two speech codecs for DMR systems. Speech Comm 7:193–207
Article Google Scholar
Linde Y, Buzo A, Gray R M (1980) An algorithm for vector-quantizer design. IEEE Transact Comm 28:84–95
Article Google Scholar
Max J (1960) Quantizing for minimum distortion. IRE Transact. Inf Th 6:7–12
Article MathSciNet Google Scholar
Möller S (2005) Quality of transmitted speech for humans and machines. Chap 7 this vol
Google Scholar
Mourjopoulos J (2005) The evolution of digital audio technology. Chap 12 this vol.
Google Scholar
Noll P, Zelinski R (1977) Adaptive transform coding of speech signals. IEEE Transact Acoustics Speech Sig Process 25:299–309
Article Google Scholar
Noll P (1993) Wideband-speech and audio coding. IEEE Comm Mag Nov:34-44
Google Scholar
Noll P (1995) Digital audio coding for visual communications. Proc IEEE 83:925–943
Article Google Scholar
Noll P (1997) MPEG digital audio coding. IEEE Sig Process Mag Sept:59–81
Google Scholar
O'shaughnessy D (1987) Speech communication-human and machine. Addison-Wesley, New York
Google Scholar
Painter T, Spanias A (2000) Perceptual coding of digital audio. Proc IEEE 88:451–513
Article Google Scholar
Painter T, Spanias A (2003) Sinusoidal analysis-synthesis of audio using perceptual criteria. EURASIP J Appl Sig Processg 15–20.
Google Scholar
Paulus J (1997) Coding of wide-band speech signals at low data rates (in German). Doct Diss, RWTH, Aachen
Google Scholar
Rabiner L R, Schafer R W (1978) Digital processing of speech signals. Prentice-Hall, Englewood Cliffs, N.J.
Google Scholar
Spanias A S (1994) Speech coding: a tutorial review. Proc IEEE 82:1541–1582
Article Google Scholar
Gao Y, Benyassine A, Thyssen J, Su H, Shlomot E (2001) EX-CELP: a speechcoding paradigm. Proc Int Conf Acoustics, Speech Sig Process, ICASSP, Salt Lake City, II:689–692
Google Scholar
Thyssen J, Gao Y, Benyassine A, Shlomot E, Murgia C, Su H, Mano K, Hiwasaki Y, Ehara A, Yasunaga K, Lamblin C, Kovesi B, Stegmann J, Kang H (2001) A candidate for the ITU-T 4 kbit/s speech-coding standard. Proc Int Conf Acoustics, Speech Sig Process, ICASSP, Salt Lake City, II:681–684
Google Scholar
Trancoso I M, Almeida L, Rodriges J, Marques J, Tribolet J (1988) Harmonic coding-state of the art and future trends. Speech Comm 7:239–245
Article Google Scholar
Tremain T E (1982) The government standard linear-predictive coding: LPC-10. Speech Technol 1:40–49
Google Scholar
Varga I (2001) Standardization of the adaptive multi-rate wideband codec. Proc ITG Conf Source & Channel Cod, Berlin, 341–346
Google Scholar
Vary P, Heute U, Hess W (1998) Digitale Sprachsignalverarbeitung. Teubner, Stuttgart
Google Scholar
Zwicker E, Fastl H (1990) Psychoacoustics-facts and models. Springer, Berlin Heidelberg
Google Scholar

Download references

Author information

Authors and Affiliations

Institute for Circuit and System Theory, Faculty of Engineering, Christian-Albrecht University, Kiel
Ulrich Heute

Authors

Ulrich Heute
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Communication Acoustics, Ruhr-University Bochum, 44780, Bochum, Germany
Jens Blauert

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Heute, U. (2005). Speech and Audio Coding — Aiming at High Quality and Low Data Rates. In: Blauert, J. (eds) Communication Acoustics. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-27437-5_14

Download citation

DOI: https://doi.org/10.1007/3-540-27437-5_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22162-3
Online ISBN: 978-3-540-27437-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics