Abstract
An analysis/synthesis system based on the sinusoidal speech model has been developed [1]. In that system, the sine-wave amplitudes and frequencies are located by searching for the peaks of the magnitude of the short-time Fourier transform (STFT) of the input speech. The phases are computed from the real and imaginary parts of the STFT at the measured frequencies. The frequencies on successive frames are matched, used in a cubic phase interpolator and applied to a sine-wave generator. Each sine wave is amplitude-modulated by the linear interpolation of the matched sine-wave amplitudes. At a 10 ms frame rate, this system produces speech that is perceptually indistinguishable from the original [1]. Since it is not possible to code all of the sine-wave parameters at low data rates, a system has been developed that codes the sine-wave frequencies by fitting a harmonic set of sine waves to the input waveform using a modified mean-squared error criterion [2], and codes the phase information implicitly using a voicing adaptive transition frequency to provide for a mixed voiced/unvoiced phase excitation model [3]. Provided a postfilter is used at the synthesizer to attenuate the noise in the formant nulls, the speech synthesized by this system is of quite high quality having achieved a DAM score of 63.0 in the uncoded mode. Since the fundamental frequency can be coded using ≈ 7 bits and the voicing measure can be coded using ≈ 3 bits, then the possibility exists for good speech quality at low data rates provided the sine-wave amplitudes can be coded efficiently. In this paper the zero-phase, harmonic analysis/synthesis system and the post-filter design methodology will be described and then the various techniques that have been examined for coding the sine-wave amplitudes will be discussed.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This work was sponsored by the Department of the Air Force.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
R. J. McAulay and T. F. Quatieri, “Speech Analysis/Synthesis Based on a Sinusoidal Representation,” IEEE Trans. Acoust., Speech and Signal Proc, Vol. ASSP-34, No. 4, August 1986, p. 744.
R. J. McAulay and T. F. Quatieri, “Pitch Estimation and Voicing Detection Based on a Sinusoidal Speech Model,” IEEE Int. Conf. Acoust., Speech and Signal Proc. (ICASSP’90), Albuquerque, NM, April 1990.
J. Makhoul, R. Viswanathan, R. Schwartz, and A. W. F. Huggins, “A Mixed-Source Model for Speech Compression and Synthesis,” Proc. IEEE Int. Conf. Acoust., Speech and Signal Proc. (ICASSP’78), Tulsa, OK, p. 163, April 1978.
R. J. McAulay and T. F. Quatieri, “Phase Modeling and Its Application to Sinusoidal Transform Coding,” Proc. IEEE Int. Conf. Acoust., Speech and Signal Proc. (ICASSP’86), Tokyo, Japan, p. 1713, April 1986.
R. J. McAulay and T. F. Quatieri, “Phase Coherence in Speech Reconstruction for Enhancement and Coding Applications,” Proc. IEEE International Conf. Acoust., Speech and Signal Proc. (ICASSP’89), Glasgow, Scotland, p. 207, May 1989.
J.-H. Chen and A. Gersho, “Real-Time Vector APC Speech Coding at 4800 b/s with Adaptive Postfiltering,” IEEE Int. Conf. Acoust., Speech and Signal Proc. (ICASSP’87), Dallas, TX, p. 51.3.1, April 1987.
D. B. Paul, “The Spectral Envelope Estimation Vocoder,” IEEE Trans. Acoust., Speech and Signal Proc., Vol. ASSP-29, No. 4, p. 786, August 1981.
J.N. Holmes, “The JSRU Channel Vocoder,” in Proc. Inst. Elect. Eng., 127, Pt. F, February 1980.
M. J. Sabin, “DPCM Coding of Spectral Amplitudes without Positive Slope Overload,” IEEE Trans. Acoust., Speech and Signal Proc. (to appear).
F. Itakura and S. Saito, “A Statistical Method for Estimation of Speech Spectral Density and Formant Frequencies,” Electron. Commun. Japan, Vol. 53-A, p. 36, 1970.
R. J. McAulay and T. Champion, “Improved Interoperable 2.4 kb/s LPC Using Sinusoidal Transform Coder Techniques,” IEEE Int. Conf. Acoust., Speech and Signal Proc. (ICASSP’90), Albuquerque, NM, April 1990.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1991 Springer Science+Business Media New York
About this chapter
Cite this chapter
McAulay, R., Parks, T., Quatieri, T., Sabin, M. (1991). Sine-Wave Amplitude Coding at Low Data Rates. In: Atal, B.S., Cuperman, V., Gersho, A. (eds) Advances in Speech Coding. The Springer International Series in Engineering and Computer Science, vol 114. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-3266-8_20
Download citation
DOI: https://doi.org/10.1007/978-1-4615-3266-8_20
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4613-6437-5
Online ISBN: 978-1-4615-3266-8
eBook Packages: Springer Book Archive