Skip to main content
Log in

Low delay coder (< 25 ms) of wideband audio (20 Hz-15 kHz) scalable from 64 to 32 kbit/s

Codeur Audio (20Hz-15kHz) Hiérarchique (64-32 kbit/s) et À Faible Retard (< 25 ms)

  • Published:
Annales Des Télécommunications Aims and scope Submit manuscript

Abstract

A low delay coder for speech and music signals sampled at 32kHz is described. Its algorithmic delay does not exceed 25 ms which enables audioconferencing applications without echo cancellation. Its bit rate is scalable between 64 and 32 kbit/s by steps of 8 kbit/s. The transmitter issues the binary code at 64 kbit/s with lower bit rate codes embedded in it. The receiver may operate at lower bit rates with gradual loss of quality. The proposed coder is based on a mixed scheme : the adopted solution contains elements from the CELP speech coder and frequency domain music coders. The perceptual signal is obtained in the time domain, then transformed to the frequency domain where bit allocation is calculated and transform coefficients are quantized. A first solution based on the dft is discussed, then a second solution based on a mdct with small overlap is applied. The quantization of these coefficients is done in the following way. First, a prediction of the whole spectrum is applied. Then, a mean- removed gain- shape split vq is used for amplitude spectrum quantization and a hierarchical 2- dimensional vq is used for phase spectrum quantization with amplitude correction. At the phase quantization stage, each codeword describing the selected vector index is split into parts corresponding to different bit rates. Due to the hierarchical codebook structure, truncated indices may be used, without much affecting the signal quality. Simulation results are presented and the robustness of the proposed coder is examined.

Résumé

On décrit un codeur à faible retard adapté à des signaux de parole et de musique échantillonnés à 32 kHz. Le retard algorithmique ne dépasse pas 25 ms ce qui permet des applications de type audioconférence sans procedure d’annulation d’écho. Le debit binaire est hiérarchique entre 64 et 32 kbit/s par pas de 8 kbit/s. L’émetteur engendre un code binaire a 64 kbit/s dans lequel sont inclus les codes correspondant à des débits plus faibles. Le récepteur peut fonctionner à débit plus faible avec une perte progressive de la qualité. Le schéma de principe du codeur propose realise un compromis entre le codeur celp adapté aux signaux de parole et des codeurs par transformée adaptes aux signaux de musique. Le signal perceptuel est obtenu dans le domaine temporel. II est ensuite transforme dans le domaine fréquentiel où une allocation de bits est faite puis les coefficients de la transformee sont quantifiés. Une première solution basee sur la ted est analysee. On présente ensuite une solution basée sur la mdct avec faible recouvrement. La quantification des coefficients a les caracteristiques suivantes. D’abord on effectue une prediction de tout le spectre. Pour quantifier le spectre d’amplitude, on utilise une quantification variable (qv) gain- forme apres exploitation de la moyenne. Pour quantifier le spectre de phase, on utilise une QV hiérarchique bidimensionnelle. Les mots de code associés à cette qv sont partitionnés pour les différents débits. Grâce à la structure hiérarchique du dictionnaire, des indices tronqués peuvent être utilisés sans trop affecter la qualité du signal reconstruit. Des résultats de simulation sont présentés et le problàme de la robustesse est examine.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Gersho (A.), “Advances in speech and audio compression,”Proceedings of the IEEE,82, n° 6, 1994.

  2. Noll (P.), “Wideband speech and audio coding,”IEEE Communications Magazine, pp. 34-44, November 1993.

  3. Noll (P.), “Digital audio coding for visual communications,”Proceedings of the ieee,83, n° 6, pp. 925–943, June 1995.

    Article  Google Scholar 

  4. Noll (P.), “mpeg digital audio coding,”ieee Signal Processing Magazine, pp. 59–81, September 1997.

  5. Schroeder (M.), Atal (B.), “Code-excited linear prediction (CELP): high-quality speech at very low bit rates,”Proceedings Int. Conf. Acoust., Speech, Signal Processing, pp. 937–940, 1985.

  6. International Organization for Standardization,iso/iec 13818-7 (MPEG-2 Advanced Audio Coding, AAC), 1997.

  7. International Organization for Standardization,iso/iec 14496-3 (Information technology - Very low bitrate audio-visual coding), 1998.

  8. R. Salami (R.), Laflamme (C.), J. Adoul (J.), Kataoka (A.), S. Hayashi (S.), Lamblin (C), Massaloux (D.), Proust (S.), Kroon (P.), andShoham (Y.), “Design and description of CS-ACELP: a toll quality 8 kb/s speech coder,”ieee Trans, on Speech and Audio Processing,6, n° 2, pp. 116–130, March 1998.

    Article  Google Scholar 

  9. Jayant (N.), Johnston (J.), andSafranek (R.), “Signal compression based on models of human perception,”Proceedings of the ieee,81, n° 10, pp. 1385–1422, October 1993.

    Article  Google Scholar 

  10. Zwicker (E.) andFeldtkeller (E.),Psychoacoustique, I’oreille récepteur d’information. Masson, Collection technique et scientifique des télécommunications, Traduit de l’allemand par C. Sorin(C), 1981.

  11. Norme internationale iso/cei 11172,Codage de I’image animée et du son associé pour les supports.de stockage numérique jusqu’è environ 1,5 Mbit/s, 1993.

  12. Berger (T.),Rate-distortion theory: A mathematical basis for data compression. Prentice-Hall, 1971.

  13. Atal (B.) andSchroeder (M.), “Predictive coding of speech signals and subjective error criteria,”ieee Trans. Acoust.,Speech, Signal Processing, vol. ASSP-27, June 1979.

  14. Murgia (C),Codage haute qualite des signaux de parole (20 Hz - 15 kHz) à trés faible retard et au debit de 64 kbit/s. PhD thesis, icp/inpg, Janvier 1996.

  15. Advanced television systems comittee,Digital audio compression standard (ac-3), 1995.

  16. Vaidyanathan (P.), Multirate digital filters, filter banks, polyphase networks and applications: A tutorial,’Proceedings of the ieee, January 1990.

  17. Moreau (N.), Dymarski (P.), Successive orthogonalizations in the multistage celp coder,Proceedings Int. Conf. Acoust.,Speech, Signal Processing, pp. 61–64, 1992.

  18. Lefebvre (R.), Salami (R.), Laflamme (C), Adoul (J.), “8 kbit/s coding of speech with 6 ms frame-length,”Proceedings Int. Conf. Acoust., Speech, Signal Processing, pp. II-612–615, 1993.

  19. Lefebvre (R.), Salami (R.), Laflamme (C), Adoul (J.), “High quality coding of wideband audio signals using transform coded excitation (TCX),”Proceedings Int. Conf. Acoust., Speech,Signal Processing, pp. 193–196, 1994.

  20. W.B. Kleun (W.B.), Paliwal (K.), Eds,Speech coding and synthesis, Elsevier, 1995.

  21. Chen (J.) and Wang (D.), “Transform predictive coding of wideband speech signals,”Proceedings Int. Conf. Acoust., Speech,Signal Processing, pp. 275–278, 1996.

  22. Tretter (S.),Introduction to discrete-time signal processing. Wiley, 1976.

  23. Jayant (N.), Noll (P.),Digital coding of waveforms. Prentice Hall, 1984.

  24. Ordentlich (E.), Shoham (Y.), “Low-delay code-excited linear- predictive coding of wideband speech at 32 kbps,”Proceedings Int. Conf. Acoust., Speech, Signal Processing, pp. 9–12, 1991.

  25. Chang (W.), Wang (C), “Audio coding using masking-thre- shold adapted perceptual filter,”Proceedings ieee Workshop on Speech Coding for Telecommunications, pp. 9–10, October 1993.

  26. Chang (W.), Wang (C.), “A masking-threshold-adapted weighting filter for excitation search,”ieee Trans, on Speech and Audio Processing, vol. 4, n° 2, pp. 124–132, March 1996.

    Article  Google Scholar 

  27. Perreau-Guimaraes (M.),Optimisation des ressources binaires et modélisation psychoacoustique pour le codage audio. PhD thesis, Université de Paris V, Juin 1998.

  28. Vaidyanathan (P.), “Quadrature mirror filter banks, M-band extensions and perfect-reconstruction techniques,”ieee Acoust.,Speech, and Signal Processing Magazine, pp. 4–20, July 1987.

  29. Princen (J.), Bradley (A.), “Analysis/synthesis filter bank design based on time domain aliasing cancellation,”ieee Trans, on Acoust., Speech, and Signal Processing,34, n° 5, pp. 1153–1161, October 1986.

    Article  Google Scholar 

  30. Princen (J.), Johnson (A.), and Bradley (A.), “Subband/trans- form coding using filter bank designs based on time domain aliasing cancellation,”Proceedings Int. Conf. Acoust., Speech,Signal Processing, pp. 2161–2164, 1987.

  31. Malvar (H.),Signal processing with lapped transforms. Artech House, 1992.

  32. Mau (J.), “Perfect reconstruction modulated filter banks,”Proceedings Int. Conf. Acoust., Speech, Signal Processing, pp. IV 273–276, 1992.

  33. Mau (J.), “Regular M-band modulated orthogonal transforms,”Proceedings Int. Conf. Acoust., Speech, Signal Processing, pp. III 125–128, 1994.

  34. Gersho (A.) andGray (R.),Vector quantization and signal compression. Kluwer Academic Publishers, 1992.

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to N. Moreau or P. Dymarski.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Moreau, N., Dymarski, P. Low delay coder (< 25 ms) of wideband audio (20 Hz-15 kHz) scalable from 64 to 32 kbit/s. Ann. Télécommun. 55, 493–506 (2000). https://doi.org/10.1007/BF02995204

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02995204

Key words

Mots clés

Navigation