Skip to main content

Speech and Audio Coding — Aiming at High Quality and Low Data Rates

  • Chapter
Communication Acoustics

Summary

The historic “coding gap” between high-rate coding of narrow- and wide-band speech on the one hand, and low-rate coding of narrow-band speech on the other hand, has been bridged more and more during the past 15 years. The GSM coder of 1990 was a very important milestone in this process, as it has prompted increasing research towards better quality and higher compression. These particular research efforts, together with other relevant activities worldwide, have helped closing the gap. In the following, the concepts behind this progress will be explained. A special focus will be put on the basis of this success, namely, on the fact that, finally, a break-through could be achieved for narrow-band speech, allowing for good quality at medium-to-low rates. For wide-band speech this holds true at medium rates. The same concepts as applied to speech coding were also applied to music, yet, ending up with some noticeable conceptual differences. While for speech time-domain approaches prevail, frequency-domain coding turned out to be more successful for audio. A characteristic, there, is extensive exploitation of psycho-acoustical phenomena.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ambikairajah E, Eps J, Lin L (2001) Wideband-speech and audio coding using gammatone filterbanks. Proc. IEEE Int Conf Acoustics, Speech, Sign Processing, ICASSP, Salt Lake City, II:773–776

    Google Scholar 

  2. Bell C G, Fujisaki H, Heinz J M, Stevens N K, House A S (1961) Reduction of speech spectra by analysis-by-synthesis techniques. J Acoust Soc Am. 33:1725–1736

    Article  Google Scholar 

  3. Bessette B, Lefebvre R, Salami R, Jelinek M, Vainio J, Rotola-Pukkila J, Järvinen K, (2001) Techniques for high-quality ACELP coding of wide-band speech. Proc EUROSPEECH, Aalborg, 3:1993–1996

    Google Scholar 

  4. Brandenburg K, Johnston J D (1990) Second-generation audio coding: the hybrid coder. Proc 88th Conv Audio Engr Soc, AES, preprint 2937

    Google Scholar 

  5. Brehm H, Stammler W (1987) Description and generation of Spherically-Invariant Speech-Model Signals. EURASIP J Sign Process 12:119–141

    Google Scholar 

  6. Carl H (1994) Examination of different speech-coding methods and an application to band-width enhancement for narrow-band speech signals (in German). Doct Diss, Ruhr-University, Bochum.

    Google Scholar 

  7. Carl H, Heute U (1994) Bandwidth enhancement of narrow-band speech signals. Proc Europ Sig Process Conf, EUSIPCO, Edinburgh, 1716–1719

    Google Scholar 

  8. Ehnert W, Heute U. (1997) Variable-rate speech coding: replacing unvoiced excitations by linear-prediction residues of different phonemes. Proc GRETSI, Grenoble, 993–996

    Google Scholar 

  9. Erdmann Ch, Vary P, et al. (2001) A candidate proposal for a 3-GPP adaptive multi-rate wide-band speech codec. Proc Int Conf Acoustics, Speech, Sig Process, ICASSP, Salt Lake City, II:757–760

    Google Scholar 

  10. ETSI (1989) Recommendation GSM 06.10: GSM full-rate transcoding”

    Google Scholar 

  11. ETSI (1994) Recommendation GSM 06.20: European digital cellular telecommunications system speech codec for the half-rate speech-traffic channel

    Google Scholar 

  12. ETSI (1996) Recommendation GSM 06.60: Digital cellular telecommunications system: enhanced full-rate (EFR) speech transcoding

    Google Scholar 

  13. ETSI (1999) Recommendation AMR 06.90: Digital cellular telecommunications system: adaptive multi-rate (AMR) speech transmission

    Google Scholar 

  14. Fastl H (2005) Psychoacoustics and sound quality. Chap 6 this vol

    Google Scholar 

  15. Gluth R, Guendel L, Heute U (1986) ATC — a candidate for digital mobile-radio telephony. Proc Nordic Sem Dig Land-Mob Radio Comm, Stockholm, 230–235

    Google Scholar 

  16. Gray R M (1984) Vector quantization. IEEE ASSP Mag 1:4–29

    Article  Google Scholar 

  17. Grill B, Edler B, et al. (1998) Information technology — very low bitrate audiovisual coding. Part 3: Audio. Subpart 1: Main Document. ISO/IEC FCD 14496-3 Subpart 1

    Google Scholar 

  18. Halle M, Stevens N K (1959) Analysis-by-synthesis. In: Wathen-Dunn W, Woods L (eds) AFCR-TR-59-198, Proc Sem Speech Compress Processg II:D7

    Google Scholar 

  19. Hammershøi D, Møller H (2005) Binaural technique: basic methods for recording, synthesis and reproduction. Chap 9 this vol

    Google Scholar 

  20. Hess W (1983) Pitch determination of speech signals. Springer, Berlin

    Google Scholar 

  21. Heute U (1988) Medium-rate speech coding — trial of a review. Speech Commm 7:125–149

    Article  Google Scholar 

  22. Heute U (1994) Speech coding: approaches, trends, standards (in German). Proc ITG Conf Source & Channel Cod, Munich, 437–448

    Google Scholar 

  23. Hudde H (2005) A functional view on the human peripheral hearing system. Chap 2 this vol

    Google Scholar 

  24. Huffman D A (1952) A method for the construction of minimum-redundancy codes. Proc IRE 40:1098–1101

    Google Scholar 

  25. Itakura F (1975) Line-spectral representation of linear-prediction coefficients of speech signals. J. Acoust Soc Am 57:S37

    Article  Google Scholar 

  26. ITU/CCITT (1988) Recommendation G.711: Coding of analogue signals by pulse-code modulation (PCM) of voice-frequencies

    Google Scholar 

  27. ITU-T (1988) Recommendation G.722: 7 kHz Audio Coding within 64 kbit/s. Fascicle III.4, Blue Book 269–341

    Google Scholar 

  28. ITU-T (1992) Recommendation G.726: 40, 32, 24, 16 kbit/s adaptive differential pulse-code modulation (ADPCM)

    Google Scholar 

  29. ITU-T (1992) Recommendation G.728: Coding of speech at 16 kbit/s, using low-delay code-excited linear prediction

    Google Scholar 

  30. ITU-T (1995) Recommendation G.729: Coding of speech at 8 kbit/s, using conjugate-structure algebraic code-excited linear prediction (CS-ACELP)

    Google Scholar 

  31. Jayant N S, Noll P (1984) Digital coding of waveforms. Prentice Hall, Englewood Cliffs

    Google Scholar 

  32. Kahrs M, Brandenburg K (1998) Application of digital signal processing to audio and acoustics. Kluwer Acad Publ

    Google Scholar 

  33. Kleijn B, Haagen J (1995) A speech coder based on decomposition of characteristic waveforms. Proc Int Conf Acoustics, Speech, Sig Process, ICASSP, Detroit, 508–511

    Google Scholar 

  34. Lacroix A (2005) Speech-production: acoustics, models, and applications. Chap 13 this vol

    Google Scholar 

  35. Lazzari V, Montagna R, Sereno D (1988) Comparison of two speech codecs for DMR systems. Speech Comm 7:193–207

    Article  Google Scholar 

  36. Linde Y, Buzo A, Gray R M (1980) An algorithm for vector-quantizer design. IEEE Transact Comm 28:84–95

    Article  Google Scholar 

  37. Max J (1960) Quantizing for minimum distortion. IRE Transact. Inf Th 6:7–12

    Article  MathSciNet  Google Scholar 

  38. Möller S (2005) Quality of transmitted speech for humans and machines. Chap 7 this vol

    Google Scholar 

  39. Mourjopoulos J (2005) The evolution of digital audio technology. Chap 12 this vol.

    Google Scholar 

  40. Noll P, Zelinski R (1977) Adaptive transform coding of speech signals. IEEE Transact Acoustics Speech Sig Process 25:299–309

    Article  Google Scholar 

  41. Noll P (1993) Wideband-speech and audio coding. IEEE Comm Mag Nov:34-44

    Google Scholar 

  42. Noll P (1995) Digital audio coding for visual communications. Proc IEEE 83:925–943

    Article  Google Scholar 

  43. Noll P (1997) MPEG digital audio coding. IEEE Sig Process Mag Sept:59–81

    Google Scholar 

  44. O'shaughnessy D (1987) Speech communication-human and machine. Addison-Wesley, New York

    Google Scholar 

  45. Painter T, Spanias A (2000) Perceptual coding of digital audio. Proc IEEE 88:451–513

    Article  Google Scholar 

  46. Painter T, Spanias A (2003) Sinusoidal analysis-synthesis of audio using perceptual criteria. EURASIP J Appl Sig Processg 15–20.

    Google Scholar 

  47. Paulus J (1997) Coding of wide-band speech signals at low data rates (in German). Doct Diss, RWTH, Aachen

    Google Scholar 

  48. Rabiner L R, Schafer R W (1978) Digital processing of speech signals. Prentice-Hall, Englewood Cliffs, N.J.

    Google Scholar 

  49. Spanias A S (1994) Speech coding: a tutorial review. Proc IEEE 82:1541–1582

    Article  Google Scholar 

  50. Gao Y, Benyassine A, Thyssen J, Su H, Shlomot E (2001) EX-CELP: a speechcoding paradigm. Proc Int Conf Acoustics, Speech Sig Process, ICASSP, Salt Lake City, II:689–692

    Google Scholar 

  51. Thyssen J, Gao Y, Benyassine A, Shlomot E, Murgia C, Su H, Mano K, Hiwasaki Y, Ehara A, Yasunaga K, Lamblin C, Kovesi B, Stegmann J, Kang H (2001) A candidate for the ITU-T 4 kbit/s speech-coding standard. Proc Int Conf Acoustics, Speech Sig Process, ICASSP, Salt Lake City, II:681–684

    Google Scholar 

  52. Trancoso I M, Almeida L, Rodriges J, Marques J, Tribolet J (1988) Harmonic coding-state of the art and future trends. Speech Comm 7:239–245

    Article  Google Scholar 

  53. Tremain T E (1982) The government standard linear-predictive coding: LPC-10. Speech Technol 1:40–49

    Google Scholar 

  54. Varga I (2001) Standardization of the adaptive multi-rate wideband codec. Proc ITG Conf Source & Channel Cod, Berlin, 341–346

    Google Scholar 

  55. Vary P, Heute U, Hess W (1998) Digitale Sprachsignalverarbeitung. Teubner, Stuttgart

    Google Scholar 

  56. Zwicker E, Fastl H (1990) Psychoacoustics-facts and models. Springer, Berlin Heidelberg

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Heute, U. (2005). Speech and Audio Coding — Aiming at High Quality and Low Data Rates. In: Blauert, J. (eds) Communication Acoustics. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-27437-5_14

Download citation

  • DOI: https://doi.org/10.1007/3-540-27437-5_14

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-22162-3

  • Online ISBN: 978-3-540-27437-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics