Phonetic Segmentation for Low Rate Speech Coding

Wang, Shihua; Gersho, Allen

doi:10.1007/978-1-4615-3266-8_22

Phonetic Segmentation for Low Rate Speech Coding

Shihua Wang⁴ &
Allen Gersho⁴

Chapter

104 Accesses
5 Citations

Part of the book series: The Springer International Series in Engineering and Computer Science ((SECS,volume 114))

Abstract

Efforts to bridge the gap between waveform coders and vocoders has led to a new class of hybrid speech coders. These coders perform analysis-by-synthesis encoding of an excitation signal and reconstruct speech from the coded excitation signal and a quantized time-varying filter model of speech production. Most notable of these coders are those which use vector quantization to code the excitation signal as a sequence of vectors. The coding technique is called Code Excited Linear Prediction (CELP) [1], or Vector Excitation Coding (VXC) [2]. VXC coders result in coded speech with a waveform approximating the original and are able to achieve a satisfactory, natural-sounding quality at bit rates as low as 4.8 kb/s. When the bitrate is reduced below 4.8 kb/s, the quality of VXC coders degrades rapidly and becomes inferior to the synthetic quality of an LPC vocoder operating at 2.4 kb/s. There remains then the challenging problem to find an algorithm that at 2.4 kb/s (or even at 3.6 kb/s) will achieve the quality that VXC offers at 4.8 kb/s

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

M. R. Schroeder and B. S. Atal, “Code-Excited Linear Prediction (CELP): High-Quality Speech at Very Low Bit Rates,” Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 937–940, Tampa, March 1985.
Google Scholar
G. Davidson and A. Gersho, “Complexity Reduction Methods for Vector Excitation Coding,” Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 3055–3058, Tokyo, Japan, April 1986.
Google Scholar
P. Kroon and B. S. Atal, “Strategies for Improving the Performance of CELP Coders at Low Bit Rates,” Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. 151–154, New York City, April 1988.
Google Scholar
Mei Yong and Allen Gersho, “Vector Excitation Coding with Dynamic Bit Allocation,” Proceedings of IEEE International Conference on Communication, vol. 1, pp. 0290–0294, Florida, November 1988.
Google Scholar
N. S. Jayant and J. H. Chen, “Speech Coding with Time-Varying Bit Allocation to Excitation and LPC Parameters,” Proc. IEEE Conf. Acoust., Speech, Sign. Processing, vol. 1, pp. 65–68, May 1989.
Google Scholar
T. Taniguchi, S. Unagami, and R. Gray, “Multimode Coding: Application to CELP,” Proc. IEEE Conf. Acoust., Speech, Sign. Processing, vol. 1, pp. 156–159, May 1989.
Google Scholar
S. Roucos, R. M. Schwartz, and J. Makhoul, “A Segment Vocoder at 150 b/s,” Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 61–64, Boston, April 1983.
Google Scholar
Maurizio Copperi, “Rule-Based Speech Analysis and Application to CELP Coding,” Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. 143–146, New York City, April 1988.
Google Scholar
Shigeru Ono and Kazunori Ozawa, “2.4 Kbps Pitch Prediction Multi-pulse Speech Coding,” Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. 175–178, New York City, April 1988.
Google Scholar
Shihua Wang and Allen Gersho, “Phonetically-Based Vector Excitation Coding of Speech at 3.6 kbit/s,” Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, Glasgow, May 1989.
Google Scholar
Osamu Fujimura and Kazunori Ozawa, “High-quality Speech Coding Using Multiple Types of Excitation Signals at 4.8 kb/s and Below,” Advances in Speech Coding, Kluwer Academic Publishers, 1990.
Google Scholar
T. Liu and H. Hoege, “Phonetically-based LPC vector quantization of high quality speech,” Eurospeech 89, section 39.4, Paris, September 89.
Google Scholar
G. Davidson, M. Yong, and A. Gersho, “Real-Time Vector Excitation Coding of Speech At 4800 bps,” Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 4, pp. 2189–2192, Dallas, April 1987.
Google Scholar
M. Yong, G. Davidson, and A. Gersho, “Encoding of LPC Spectral Parameters Using Switched-Adaptive Interframe Vector Prediction,” Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. 402–405, New York City, April 1988.
Google Scholar
S. Singhal and B. S. Atal, “Improving Performance of Multi-Pulse LPC coders at Low Bit Rates,” Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 1.3.1–1.3.4, San Diego, 1984.
Google Scholar
Mei Yong and Allen Gersho, “Efficient Encoding of the Long-term Predictor in Vector Excitation Coders,” Advances in Speech Coding, Kluwer Academic Publishers, 1990.
Google Scholar

Download references

Author information

Authors and Affiliations

Center for Information Processing Research Department of Electrical and Computer Engineering, University of California at Santa Barbara, Santa Barbara, CA, 93106, USA
Shihua Wang & Allen Gersho

Authors

Shihua Wang
View author publications
You can also search for this author in PubMed Google Scholar
Allen Gersho
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

AT&T Bell Laboratories, USA
Bishnu S. Atal
Simon Fraser University, Canada
Vladimir Cuperman
University of California, Santa Barbara, USA
Allen Gersho

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Wang, S., Gersho, A. (1991). Phonetic Segmentation for Low Rate Speech Coding. In: Atal, B.S., Cuperman, V., Gersho, A. (eds) Advances in Speech Coding. The Springer International Series in Engineering and Computer Science, vol 114. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-3266-8_22

Download citation

DOI: https://doi.org/10.1007/978-1-4615-3266-8_22
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4613-6437-5
Online ISBN: 978-1-4615-3266-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics