Quantization of Speech Features: Source Coding
In this chapter, we describe various schemes for quantizing speech features to be used in distributed speech recognition (DSR) systems. We analyze the statistical properties of Mel frequency-warped cepstral coefficients (MFCCs) that are most relevant to quantization, namely the correlation and probability density function shape, in order to determine the type of quantization scheme that would be most suitable for quantizing them efficiently. We also determine empirically the relationship between mean squared error and recognition accuracy in order to verify that quantization schemes, which minimize mean squared error, are also guaranteed to improve the recognition performance. Furthermore, we highlight the importance of noise robustness in DSR and describe the use of a perceptually weighted distance measure to enhance spectral peaks in vector quantization. Finally, we present some experimental results on the quantization schemes in a DSR framework and compare their relative recognition performances.
Unable to display preview. Download preview PDF.
- ETSI Standard ES 201 108 (2003). Speech Processing, Transmission and Quality Aspects (STQ); Distributed Speech Recognition; Front-End Feature Extraction Algorithm; Com-pression Algorithms. Tech. Rep. Standard ES 201 108 v1.1.3, European Telecommunica-tions Standards Institute (ETSI).Google Scholar
- Hirsch, H.G., and Pearce, D. (2000). The Aurora Experimental Framework for the Perform-ance Evaluation of Speech Recognition Systems Under Noisy Conditions. ISCA ITRW ASR2000, Paris, France.Google Scholar
- Juang, B.H., Rabiner, L.R., and Wilpon, J.G. (1987). “On the Use of Bandpass Liftering for Speech Recognition.” IEEE Trans. Acoust. Speech Signal Process. 1: 597-600.Google Scholar
- Kiss, I., and Kapanen, P. (1999). Robust Feature Vector Compression Algorithm for Distrib-uted Speech Recognition. Eur. Conf. Speech Commun. Technol..Google Scholar
- Leonard, R.G. (1984). A Database for Speaker-Independent Digit Recognition. Proc. IEEE. Int. Conf. Acoust. Speech Signal Process.Google Scholar
- Paliwal, K.K. (1999). Decorrelated and Liftered Filterbank Energies for Robust Speech Rec-ognition. Eur. Conf. Speech Commun. Technol., Budapest, Hungary.Google Scholar
- Paliwal, K.K., and So, S. (2005). “A Fractional Bit Encoding Technique for the GMM-Based Block Quantization of Images.” Digital Signal Process. 15(3): 435-446.Google Scholar
- Rabiner, L., and Juang, B.H. (1993). Fundamentals of Speech Recognition. Prentice Hall, New Jersey.Google Scholar
- Ramaswamy, G.N., and Gopalakrishnan, P.S. (1998). Compression of Acoustic Features for Speech Recognition in Network Environments. IEEE Int. Conf. Acoust. SpeechSignal Process.Google Scholar
- So, S., and Paliwal, K.K. (2005). Improved Noise-Robustness in Distributed Speech Recogni-tion via Perceptually-Weighted Vector Quantisation of Filterbank Energies. Eur. Conf. Speech Commun. Technol., Lisbon, Portugal.Google Scholar
- Young, S., Evermann, G., Hain, T., Kershaw, D., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., and Woodland, P. (2002). The HTK Book (for HTK Version 3.2.1). Cam-bridge University Engineering Department.Google Scholar
- Zhu, Q., and Alwan, A. (2000). On the Use of Variable Frame Rate Analysis in Speech Rec-ognition. IEEE Int. Conf. Acoust. Speech Signal Process.Google Scholar
- Zhu, Q., and Alwan, A. (2001). An Efficient and Scalable 2D DCT-Based Feature Coding Scheme for Remote Speech Recognition. IEEE Int. Conf. Acoust. Speech Signal Process.Google Scholar