Low bit-rate speech coding based on multicomponent AFM signal model
In this paper, we propose a novel multicomponent amplitude and frequency modulated (AFM) signal model for parametric representation of speech phonemes. An efficient technique is developed for parameter estimation of the proposed model. The Fourier–Bessel series expansion is used to separate a multicomponent speech signal into a set of individual components. The discrete energy separation algorithm is used to extract the amplitude envelope (AE) and the instantaneous frequency (IF) of each component of the speech signal. Then, the parameter estimation of the proposed AFM signal model is carried out by analysing the AE and IF parts of the signal component. The developed model is found to be suitable for representation of an entire speech phoneme (voiced or unvoiced) irrespective of its time duration, and the model is shown to be applicable for low bit-rate speech coding. The symmetric Itakura–Saito and the root-mean-square log-spectral distance measures are used for comparison of the original and reconstructed speech signals.
KeywordsParametric model Non-stationary signal analysis Multi-tone amplitude and frequency modulation Fourier–Bessel expansion Discrete energy separation algorithm Speech coding
- Bradbury, J. (2000). Linear predictive coding. http://my.fit.edu/~vkepuska/ece5525/lpc_paper.pdf.
- Equipments, T. (1990). 40, 32, 24, 16 kbit/s adaptive differential pulse code modulation (adpcm). ITU-T Recommendation, G, 726:59.Google Scholar
- Jayant, N. S., & Noll, P. (1984). Digital coding of waveforms: Principles and applications to speech and video. Englewood Cliffs, NJ: Prentice Hall.Google Scholar
- McAulay, R. J., & Quatieri, T. F. (1984). Magnitude-only reconstruction using a sinusoidal speech model. In Proceedings of the IEEE international conference on acoustics, speech and signal processing (ICASSP 1984) (pp. 441–444).Google Scholar
- McAulay, R. J., & Quatieri, T. F. (1990). Pitch estimation and voicing detection based on a sinusoidal speech model. In Proceedings of the IEEE international conference on acoustics, speech and signal processing, (ICASSP 1990) (pp. 249–252).Google Scholar
- McAulay, R. J., & Quatieri, T. F. (1992). Low-rate speech coding based on the sinusoidal model. In S. Furui & M. M. Sondhi (Eds.), Advances in speech signal processing. New York: Marcel Dekker. chap 6.Google Scholar
- Mukherjee, H., Obaidullah, S. M., Santosh, K. C., Phadikar, S., & Roy, K. (2018). Line spectral frequency-based features and extreme learning machine for voice activity detection from audio signal. The International Journal of Speech Technology. https://doi.org/10.1007/s10772-018-9525-6.
- Pachori, R. B., & Sircar, P. (2006). Speech analysis using Fourier-Bessel expansion and discrete energy separation algorithm. In 12th Digital Signal Processing Workshop, 4th Signal Processing Education Workshop (pp. 423–428). IEEE.Google Scholar
- Rabiner, L. R., & Juang, B. H. (1993). Fundamentals of speech recognition. Englewood Cliffs, NJ: Prentice Hall.Google Scholar
- Recommendation G. (1988). Pulse code modulation (PCM) of voice frequencies. Geneva: ITU.Google Scholar
- Schroeder, M., & Atal, B. (1985). Code-excited linear prediction (CELP): High-quality speech at very low bit rates. In Acoustics, speech, and signal processing, IEEE international conference on ICASSP’85 (Vol. 10, pp. 937–940). IEEE.Google Scholar
- Tabet, Y., Boughazi, M., & Afifi, S. (2018). Speech analysis and synthesis with a refined adaptive sinusoidal representation. The International Journal of Speech Technology. https://doi.org/10.1007/s10772-018-9519-4.
- Wei, B., & Gibson, J. D. (2001). Comparison of distance measures in discrete spectral modeling. Master’s thesis, Southern Methodist University, Dallas, TX.Google Scholar