In this chapter, we describe various schemes for quantizing speech features to be used in distributed speech recognition (DSR) systems. We analyze the statistical properties of Mel frequency-warped cepstral coefficients (MFCCs) that are most relevant to quantization, namely the correlation and probability density function shape, in order to determine the type of quantization scheme that would be most suitable for quantizing them efficiently. We also determine empirically the relationship between mean squared error and recognition accuracy in order to verify that quantization schemes, which minimize mean squared error, are also guaranteed to improve the recognition performance. Furthermore, we highlight the importance of noise robustness in DSR and describe the use of a perceptually weighted distance measure to enhance spectral peaks in vector quantization. Finally, we present some experimental results on the quantization schemes in a DSR framework and compare their relative recognition performances.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
ETSI Standard ES 201 108 (2003). Speech Processing, Transmission and Quality Aspects (STQ); Distributed Speech Recognition; Front-End Feature Extraction Algorithm; Com-pression Algorithms. Tech. Rep. Standard ES 201 108 v1.1.3, European Telecommunica-tions Standards Institute (ETSI).
Chen, W., and Smith, C.H. (1977). “Adaptive Coding of Monochrome and Color Images.” IEEE Trans. Commun. COM-25(11): 1285-1292.
Davis, S.B., and Mermelstein, P. (1980). “Comparison of Parametric Representations of Monosyllabic Word Recognition in Continuously Spoken Sentences.” IEEE Trans. Acoust. Speech Signal Process. ASSP-28(4): 357-366.
Dempster, A.P., Laird, N.M., and Rubin, D.B. (1977). “Maximum Likelihood from Incom-plete Data via the EM Algorithm.” J. Roy. Stat. Soc. 39: 1-38.
Digalakis, V.V., Neumeyer, L.G., and Perakakis, M. (1999). “Quantization of Cepstral Pa-rameters for Speech Recognition over the World Wide Web.” IEEE J. Select. Areas Commun. 17(1): 82-90.
Gersho, A., and Gray, R.M. (1992). Vector Quantization and Signal Compression. Kluwer Academic Publishers, Massachusetts.
Gray, R.M., and Neuhoff, D.L. (1998). “Quantization.” IEEE Trans. Inform. Theory 44(6): 2325-2383.
Hirsch, H.G., and Pearce, D. (2000). The Aurora Experimental Framework for the Perform-ance Evaluation of Speech Recognition Systems Under Noisy Conditions. ISCA ITRW ASR2000, Paris, France.
Huang, J.J.Y., and Schultheiss, P.M. (1963). “Block Quantization of Correlated Gaussian Random Variables.” IEEE Trans. Commun. CS-11: 289-296.
Juang, B.H., Rabiner, L.R., and Wilpon, J.G. (1987). “On the Use of Bandpass Liftering for Speech Recognition.” IEEE Trans. Acoust. Speech Signal Process. 1: 597-600.
Kiss, I., and Kapanen, P. (1999). Robust Feature Vector Compression Algorithm for Distrib-uted Speech Recognition. Eur. Conf. Speech Commun. Technol..
Leonard, R.G. (1984). A Database for Speaker-Independent Digit Recognition. Proc. IEEE. Int. Conf. Acoust. Speech Signal Process.
Linde, Y., Buzo, A., and Gray, R.M. (1980). “An Algorithm for Vector Quantizer Design.” IEEE Trans. Commun. 28(1): 84-95.
Lloyd, S.P. (1982). “Least Square Quantization in PCM.” IEEE Trans. Inform. Theory IT-28(2): 129-137.
Lookabaugh, T.D., and Gray, R.M. (1989). “High-Resolution Quantization Theory and the Vector Quantizer Advantage.” IEEE Trans. Inform. Theory 35(5): 1020-1033.
Makhoul, J., Roucos, S., and Gish, H. (1985). “Vector Quantization in Speech Coding.” Proc. IEEE 73: 1551-1588.
Max, J. (1960). “Quantizing for Minimum Distortion.” IRE Trans. Inform. Theory IT-6: 7-12.
Paez, M.D., and Glisson, T.H. (1972). “Minimum Mean-Squared-Error Quantization in Speech PCM and DPCM System.” IEEE Trans. Commun. COM-20: 225-230.
Paliwal, K.K. (1982). “On the Performance of the Quefrency-Weighted Cepstral Coefficients in Vowel Recognition.” Speech Commun. 1: 151-154.
Paliwal, K.K. (1999). Decorrelated and Liftered Filterbank Energies for Robust Speech Rec-ognition. Eur. Conf. Speech Commun. Technol., Budapest, Hungary.
Paliwal, K.K., and Atal, B.S. (1993). “Efficient Vector Quantization of LPC Parameters at 24 Bits/Frame.” IEEE Trans. Speech Audio Process. 1(1): 3-14.
Paliwal, K.K., and So, S. (2005). “A Fractional Bit Encoding Technique for the GMM-Based Block Quantization of Images.” Digital Signal Process. 15(3): 435-446.
Rabiner, L., and Juang, B.H. (1993). Fundamentals of Speech Recognition. Prentice Hall, New Jersey.
Ramaswamy, G.N., and Gopalakrishnan, P.S. (1998). Compression of Acoustic Features for Speech Recognition in Network Environments. IEEE Int. Conf. Acoust. SpeechSignal Process.
Segall, A. (1976). “Bit Allocation and Encoding of Vector Sources.” IEEE Trans. Inform. Theory IT-22(2): 162-169.
So, S., and Paliwal, K.K. (2005). Improved Noise-Robustness in Distributed Speech Recogni-tion via Perceptually-Weighted Vector Quantisation of Filterbank Energies. Eur. Conf. Speech Commun. Technol., Lisbon, Portugal.
So, S., and Paliwal, K.K. (2006). “Scalable Distributed Speech Recognition Using Gaussian Mixture Model-Based Block Quantisation.” Speech Commun. 48: 746-758.
Srinivasamurthy, N., Ortega, A., and Narayanan, S. (2006). “Efficient Scalable Encoding for Distributed Speech Recognition.” Speech Commun. 48(8): 888-902.
Strope, B., and Alwan, A. (1997). “A Model of Dynamic Auditory Perception and its Applica-tion to Robust Word Recognition.” IEEE Trans. Speech Audio Process. 5(2): 451-464.
Subramaniam, A.D., and Rao, B.D. (2003). “PDF Optimized Parametric Vector Quantization of Speech Line Spectral Frequencies.” IEEE Trans. Speech Audio Process. 11(2): 130-142.
Tsao, C., and Gray, R.M. (1985). “Matrix Quantizer Design for LPC Speech using the Gener-alized Lloyd Algorithm.” IEEE Trans. Acoust. Speech Signal Process. 33: 537-545.
Wallace, G.K. (1991). “The JPEG Still Picture Compression Standard.” Commun. ACM 34 (4): 30-44.
Young, S., Evermann, G., Hain, T., Kershaw, D., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., and Woodland, P. (2002). The HTK Book (for HTK Version 3.2.1). Cam-bridge University Engineering Department.
Zhu, Q., and Alwan, A. (2000). On the Use of Variable Frame Rate Analysis in Speech Rec-ognition. IEEE Int. Conf. Acoust. Speech Signal Process.
Zhu, Q., and Alwan, A. (2001). An Efficient and Scalable 2D DCT-Based Feature Coding Scheme for Remote Speech Recognition. IEEE Int. Conf. Acoust. Speech Signal Process.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag London Limited
About this chapter
Cite this chapter
So, S., Paliwal, K.K. (2008). Quantization of Speech Features: Source Coding. In: Automatic Speech Recognition on Mobile Devices and over Communication Networks. Advances in Pattern Recognition. Springer, London. https://doi.org/10.1007/978-1-84800-143-5_7
Download citation
DOI: https://doi.org/10.1007/978-1-84800-143-5_7
Publisher Name: Springer, London
Print ISBN: 978-1-84800-142-8
Online ISBN: 978-1-84800-143-5
eBook Packages: Computer ScienceComputer Science (R0)