Skip to main content
Log in

Scalable perceptual audio representation with an adaptive three time-scale sinusoidal signal model

  • Papers
  • Published:
Journal of Electronics (China)

Abstract

This work is concerned with the development and optimization of a signal model for scalable perceptual audio coding at low bit rates. A complementary two-part signal model consisting of Sines plus Noise (SN) is described. The paper presents essentially a fundamental enhancement to the sinusoidal modeling component. The enhancement involves an audio signal scheme based on carrying out overlap-add sinusoidal modeling at three successive time scales, large, medium, and small. The sinusoidal modeling is done in an analysis-by-synthesis overlap-add manner across the three scales by using a psychoacoustically weighted matching pursuits. The sinusoidal modeling residual at the first scale is passed to the smaller scales to allow for the modeling of various signal features at appropriate resolutions. This approach greatly helps to correct the pre-echo inherent in the sinusoidal model. This improves the perceptual audio quality upon our previous work of sinusoidal modeling while using the same number of sinusoids. The most obvious application for the SN model is in scalable, high fidelity audio coding and signal modification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. ISO/IEC, 14496-3: MPEG-4 audio, Tech. Rep. JTSC1/SC29/WG11 N2503, ISO/IEC, October (1998).

  2. R. Coifman, M. Wickerhauser, Entropy based algorithm for best basis selection, IEEE Trans. on Information Theory, 38(1992)2, 713–718.

    Article  MATH  Google Scholar 

  3. S. Mallat, Z. Zhang, Matching pursuits with time-frequency dictionaries, IEEE Trans. on SP, 41(1993)12, 3397–3415.

    Article  MATH  Google Scholar 

  4. M. Goodwin, Matching pursuit with damped sinusoids, In Proc. IEEE ICASSP, Münich, 1997, Vol.3, 2037–2040.

  5. D. V. Anderson, Speech analysis and coding using a multi-resolution sinusoidal transform, In Proc. IEEE ICASSP, Atlanta, 1996, Vol.2, 1045–1048.

    Google Scholar 

  6. M. Rodriguez-Hernandez, F. Casajus-Quiros, Improving time-scale modification of audio signals using wavelets, In Proc. ICSPAT, 1994, 1573–1577.

  7. D. Ellis, B. Vercoe, A wavelet-based sinusoidal model of sound for auditory signal separation, In Proc. Int. Comp. Mus. Conf., Montreal, 1991, 86–89.

  8. S. Levine, J. O. Smith III, A sines+transients+noise audio representation for data compression and time/pitch-scale modification, In Proc. of the 105th AES Con., San Francisco, 1998, 1–21.

  9. M. Goodwin, Multiresolution sinusoidal modeling using adaptive segmentation, In Proc. IEEE ICASSP, Seattle, 1998, Vol.3, 1525–1528.

    Google Scholar 

  10. AL-Moussawy Raed, Yin Junxun, Huang Jiancheng, A perceptual audio representation for low rate coding based on sines+noise modeling, To be published in Acta Electronica Sinica (in Chinese).

  11. E. B. George, M. Smith, Analysis-by-synthesis/overlap-add sinusoidal modeling applied to the analysis and synthesis of musical tones, Journal of the AES, 40(1992)6, 497–516.

    Google Scholar 

  12. AL-Moussawy Raed, et al., A flexible and efficient sinusoidal modeling using matching pursuits suited for signal compression, Journal of South China University of Technology, 29(2001)10, 38–41.

    Google Scholar 

  13. ISO/MPEEG Committee, Information technology-coding of moving pictures and associated audio for digital storage media at up to about 5 1.5 Mbit/s-part 3: Audio, ISO/IEC 11172-3.

  14. X. Rodet, P. Depalle, Spectral envelopes and inverse FFT synthesis, In Proc. of the 93rd AES Conv., San Francisco, 1992.

  15. M. Goodwin, Residual modeling in music analysis/synthesis, In Proc. IEEE ICASSP, Atlanta, 1996, Vol.2, 1005–1008.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Additional information

Supported by the National Natural Science Foundation of China (No.69802007), Motorola China Research Center (No.B38300), and Natural Science Foundation of Guangdong (No.011611)

About this article

Cite this article

Raed, AM., Yin, J. & Song, S. Scalable perceptual audio representation with an adaptive three time-scale sinusoidal signal model. J. of Electron.(China) 21, 213–221 (2004). https://doi.org/10.1007/BF02687874

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02687874

Key words

Navigation