Quantization of Speech Features: Source Coding

So, Stephen; Paliwal, Kuldip K.

doi:10.1007/978-1-84800-143-5_7

Stephen So³ &
Kuldip K. Paliwal³

Part of the book series: Advances in Pattern Recognition ((ACVPR))

1186 Accesses
2 Citations

In this chapter, we describe various schemes for quantizing speech features to be used in distributed speech recognition (DSR) systems. We analyze the statistical properties of Mel frequency-warped cepstral coefficients (MFCCs) that are most relevant to quantization, namely the correlation and probability density function shape, in order to determine the type of quantization scheme that would be most suitable for quantizing them efficiently. We also determine empirically the relationship between mean squared error and recognition accuracy in order to verify that quantization schemes, which minimize mean squared error, are also guaranteed to improve the recognition performance. Furthermore, we highlight the importance of noise robustness in DSR and describe the use of a perceptually weighted distance measure to enhance spectral peaks in vector quantization. Finally, we present some experimental results on the quantization schemes in a DSR framework and compare their relative recognition performances.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

ETSI Standard ES 201 108 (2003). Speech Processing, Transmission and Quality Aspects (STQ); Distributed Speech Recognition; Front-End Feature Extraction Algorithm; Com-pression Algorithms. Tech. Rep. Standard ES 201 108 v1.1.3, European Telecommunica-tions Standards Institute (ETSI).
Google Scholar
Chen, W., and Smith, C.H. (1977). “Adaptive Coding of Monochrome and Color Images.” IEEE Trans. Commun. COM-25(11): 1285-1292.
Article Google Scholar
Davis, S.B., and Mermelstein, P. (1980). “Comparison of Parametric Representations of Monosyllabic Word Recognition in Continuously Spoken Sentences.” IEEE Trans. Acoust. Speech Signal Process. ASSP-28(4): 357-366.
Article Google Scholar
Dempster, A.P., Laird, N.M., and Rubin, D.B. (1977). “Maximum Likelihood from Incom-plete Data via the EM Algorithm.” J. Roy. Stat. Soc. 39: 1-38.
MATH MathSciNet Google Scholar
Digalakis, V.V., Neumeyer, L.G., and Perakakis, M. (1999). “Quantization of Cepstral Pa-rameters for Speech Recognition over the World Wide Web.” IEEE J. Select. Areas Commun. 17(1): 82-90.
Article Google Scholar
Gersho, A., and Gray, R.M. (1992). Vector Quantization and Signal Compression. Kluwer Academic Publishers, Massachusetts.
MATH Google Scholar
Gray, R.M., and Neuhoff, D.L. (1998). “Quantization.” IEEE Trans. Inform. Theory 44(6): 2325-2383.
Article MATH MathSciNet Google Scholar
Hirsch, H.G., and Pearce, D. (2000). The Aurora Experimental Framework for the Perform-ance Evaluation of Speech Recognition Systems Under Noisy Conditions. ISCA ITRW ASR2000, Paris, France.
Google Scholar
Huang, J.J.Y., and Schultheiss, P.M. (1963). “Block Quantization of Correlated Gaussian Random Variables.” IEEE Trans. Commun. CS-11: 289-296.
Article Google Scholar
Juang, B.H., Rabiner, L.R., and Wilpon, J.G. (1987). “On the Use of Bandpass Liftering for Speech Recognition.” IEEE Trans. Acoust. Speech Signal Process. 1: 597-600.
Google Scholar
Kiss, I., and Kapanen, P. (1999). Robust Feature Vector Compression Algorithm for Distrib-uted Speech Recognition. Eur. Conf. Speech Commun. Technol..
Google Scholar
Leonard, R.G. (1984). A Database for Speaker-Independent Digit Recognition. Proc. IEEE. Int. Conf. Acoust. Speech Signal Process.
Google Scholar
Linde, Y., Buzo, A., and Gray, R.M. (1980). “An Algorithm for Vector Quantizer Design.” IEEE Trans. Commun. 28(1): 84-95.
Article Google Scholar
Lloyd, S.P. (1982). “Least Square Quantization in PCM.” IEEE Trans. Inform. Theory IT-28(2): 129-137.
Article MathSciNet Google Scholar
Lookabaugh, T.D., and Gray, R.M. (1989). “High-Resolution Quantization Theory and the Vector Quantizer Advantage.” IEEE Trans. Inform. Theory 35(5): 1020-1033.
Article MathSciNet Google Scholar
Makhoul, J., Roucos, S., and Gish, H. (1985). “Vector Quantization in Speech Coding.” Proc. IEEE 73: 1551-1588.
Article Google Scholar
Max, J. (1960). “Quantizing for Minimum Distortion.” IRE Trans. Inform. Theory IT-6: 7-12.
Article MathSciNet Google Scholar
Paez, M.D., and Glisson, T.H. (1972). “Minimum Mean-Squared-Error Quantization in Speech PCM and DPCM System.” IEEE Trans. Commun. COM-20: 225-230.
Article Google Scholar
Paliwal, K.K. (1982). “On the Performance of the Quefrency-Weighted Cepstral Coefficients in Vowel Recognition.” Speech Commun. 1: 151-154.
Article Google Scholar
Paliwal, K.K. (1999). Decorrelated and Liftered Filterbank Energies for Robust Speech Rec-ognition. Eur. Conf. Speech Commun. Technol., Budapest, Hungary.
Google Scholar
Paliwal, K.K., and Atal, B.S. (1993). “Efficient Vector Quantization of LPC Parameters at 24 Bits/Frame.” IEEE Trans. Speech Audio Process. 1(1): 3-14.
Article Google Scholar
Paliwal, K.K., and So, S. (2005). “A Fractional Bit Encoding Technique for the GMM-Based Block Quantization of Images.” Digital Signal Process. 15(3): 435-446.
Google Scholar
Rabiner, L., and Juang, B.H. (1993). Fundamentals of Speech Recognition. Prentice Hall, New Jersey.
Google Scholar
Ramaswamy, G.N., and Gopalakrishnan, P.S. (1998). Compression of Acoustic Features for Speech Recognition in Network Environments. IEEE Int. Conf. Acoust. SpeechSignal Process.
Google Scholar
Segall, A. (1976). “Bit Allocation and Encoding of Vector Sources.” IEEE Trans. Inform. Theory IT-22(2): 162-169.
Article MathSciNet Google Scholar
So, S., and Paliwal, K.K. (2005). Improved Noise-Robustness in Distributed Speech Recogni-tion via Perceptually-Weighted Vector Quantisation of Filterbank Energies. Eur. Conf. Speech Commun. Technol., Lisbon, Portugal.
Google Scholar
So, S., and Paliwal, K.K. (2006). “Scalable Distributed Speech Recognition Using Gaussian Mixture Model-Based Block Quantisation.” Speech Commun. 48: 746-758.
Article Google Scholar
Srinivasamurthy, N., Ortega, A., and Narayanan, S. (2006). “Efficient Scalable Encoding for Distributed Speech Recognition.” Speech Commun. 48(8): 888-902.
Article Google Scholar
Strope, B., and Alwan, A. (1997). “A Model of Dynamic Auditory Perception and its Applica-tion to Robust Word Recognition.” IEEE Trans. Speech Audio Process. 5(2): 451-464.
Article Google Scholar
Subramaniam, A.D., and Rao, B.D. (2003). “PDF Optimized Parametric Vector Quantization of Speech Line Spectral Frequencies.” IEEE Trans. Speech Audio Process. 11(2): 130-142.
Article Google Scholar
Tsao, C., and Gray, R.M. (1985). “Matrix Quantizer Design for LPC Speech using the Gener-alized Lloyd Algorithm.” IEEE Trans. Acoust. Speech Signal Process. 33: 537-545.
Article Google Scholar
Wallace, G.K. (1991). “The JPEG Still Picture Compression Standard.” Commun. ACM 34 (4): 30-44.
Article Google Scholar
Young, S., Evermann, G., Hain, T., Kershaw, D., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., and Woodland, P. (2002). The HTK Book (for HTK Version 3.2.1). Cam-bridge University Engineering Department.
Google Scholar
Zhu, Q., and Alwan, A. (2000). On the Use of Variable Frame Rate Analysis in Speech Rec-ognition. IEEE Int. Conf. Acoust. Speech Signal Process.
Google Scholar
Zhu, Q., and Alwan, A. (2001). An Efficient and Scalable 2D DCT-Based Feature Coding Scheme for Remote Speech Recognition. IEEE Int. Conf. Acoust. Speech Signal Process.
Google Scholar

Download references

Author information

Authors and Affiliations

Griffith School of Engineering, Signal Processing Laboratory, Griffith University, QLD, 4222, Australia
Stephen So & Kuldip K. Paliwal

Authors

Stephen So
View author publications
You can also search for this author in PubMed Google Scholar
Kuldip K. Paliwal
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

So, S., Paliwal, K.K. (2008). Quantization of Speech Features: Source Coding. In: Automatic Speech Recognition on Mobile Devices and over Communication Networks. Advances in Pattern Recognition. Springer, London. https://doi.org/10.1007/978-1-84800-143-5_7

Download citation

DOI: https://doi.org/10.1007/978-1-84800-143-5_7
Publisher Name: Springer, London
Print ISBN: 978-1-84800-142-8
Online ISBN: 978-1-84800-143-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics