Quantization of Speech Features: Source Coding

  • Stephen So
  • Kuldip K. Paliwal

In this chapter, we describe various schemes for quantizing speech features to be used in distributed speech recognition (DSR) systems. We analyze the statistical properties of Mel frequency-warped cepstral coefficients (MFCCs) that are most relevant to quantization, namely the correlation and probability density function shape, in order to determine the type of quantization scheme that would be most suitable for quantizing them efficiently. We also determine empirically the relationship between mean squared error and recognition accuracy in order to verify that quantization schemes, which minimize mean squared error, are also guaranteed to improve the recognition performance. Furthermore, we highlight the importance of noise robustness in DSR and describe the use of a perceptually weighted distance measure to enhance spectral peaks in vector quantization. Finally, we present some experimental results on the quantization schemes in a DSR framework and compare their relative recognition performances.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. ETSI Standard ES 201 108 (2003). Speech Processing, Transmission and Quality Aspects (STQ); Distributed Speech Recognition; Front-End Feature Extraction Algorithm; Com-pression Algorithms. Tech. Rep. Standard ES 201 108 v1.1.3, European Telecommunica-tions Standards Institute (ETSI).Google Scholar
  2. Chen, W., and Smith, C.H. (1977). “Adaptive Coding of Monochrome and Color Images.” IEEE Trans. Commun. COM-25(11): 1285-1292.CrossRefGoogle Scholar
  3. Davis, S.B., and Mermelstein, P. (1980). “Comparison of Parametric Representations of Monosyllabic Word Recognition in Continuously Spoken Sentences.” IEEE Trans. Acoust. Speech Signal Process. ASSP-28(4): 357-366.CrossRefGoogle Scholar
  4. Dempster, A.P., Laird, N.M., and Rubin, D.B. (1977). “Maximum Likelihood from Incom-plete Data via the EM Algorithm.” J. Roy. Stat. Soc. 39: 1-38.MATHMathSciNetGoogle Scholar
  5. Digalakis, V.V., Neumeyer, L.G., and Perakakis, M. (1999). “Quantization of Cepstral Pa-rameters for Speech Recognition over the World Wide Web.” IEEE J. Select. Areas Commun. 17(1): 82-90.CrossRefGoogle Scholar
  6. Gersho, A., and Gray, R.M. (1992). Vector Quantization and Signal Compression. Kluwer Academic Publishers, Massachusetts. MATHGoogle Scholar
  7. Gray, R.M., and Neuhoff, D.L. (1998). “Quantization.” IEEE Trans. Inform. Theory 44(6): 2325-2383.MATHCrossRefMathSciNetGoogle Scholar
  8. Hirsch, H.G., and Pearce, D. (2000). The Aurora Experimental Framework for the Perform-ance Evaluation of Speech Recognition Systems Under Noisy Conditions. ISCA ITRW ASR2000, Paris, France.Google Scholar
  9. Huang, J.J.Y., and Schultheiss, P.M. (1963). “Block Quantization of Correlated Gaussian Random Variables.” IEEE Trans. Commun. CS-11: 289-296.CrossRefGoogle Scholar
  10. Juang, B.H., Rabiner, L.R., and Wilpon, J.G. (1987). “On the Use of Bandpass Liftering for Speech Recognition.” IEEE Trans. Acoust. Speech Signal Process. 1: 597-600.Google Scholar
  11. Kiss, I., and Kapanen, P. (1999). Robust Feature Vector Compression Algorithm for Distrib-uted Speech Recognition. Eur. Conf. Speech Commun. Technol..Google Scholar
  12. Leonard, R.G. (1984). A Database for Speaker-Independent Digit Recognition. Proc. IEEE. Int. Conf. Acoust. Speech Signal Process.Google Scholar
  13. Linde, Y., Buzo, A., and Gray, R.M. (1980). “An Algorithm for Vector Quantizer Design.” IEEE Trans. Commun. 28(1): 84-95.CrossRefGoogle Scholar
  14. Lloyd, S.P. (1982). “Least Square Quantization in PCM.” IEEE Trans. Inform. Theory IT-28(2): 129-137.CrossRefMathSciNetGoogle Scholar
  15. Lookabaugh, T.D., and Gray, R.M. (1989). “High-Resolution Quantization Theory and the Vector Quantizer Advantage.” IEEE Trans. Inform. Theory 35(5): 1020-1033.CrossRefMathSciNetGoogle Scholar
  16. Makhoul, J., Roucos, S., and Gish, H. (1985). “Vector Quantization in Speech Coding.” Proc. IEEE 73: 1551-1588.CrossRefGoogle Scholar
  17. Max, J. (1960). “Quantizing for Minimum Distortion.” IRE Trans. Inform. Theory IT-6: 7-12.CrossRefMathSciNetGoogle Scholar
  18. Paez, M.D., and Glisson, T.H. (1972). “Minimum Mean-Squared-Error Quantization in Speech PCM and DPCM System.” IEEE Trans. Commun. COM-20: 225-230.CrossRefGoogle Scholar
  19. Paliwal, K.K. (1982). “On the Performance of the Quefrency-Weighted Cepstral Coefficients in Vowel Recognition.” Speech Commun. 1: 151-154.CrossRefGoogle Scholar
  20. Paliwal, K.K. (1999). Decorrelated and Liftered Filterbank Energies for Robust Speech Rec-ognition. Eur. Conf. Speech Commun. Technol., Budapest, Hungary.Google Scholar
  21. Paliwal, K.K., and Atal, B.S. (1993). “Efficient Vector Quantization of LPC Parameters at 24 Bits/Frame.” IEEE Trans. Speech Audio Process. 1(1): 3-14.CrossRefGoogle Scholar
  22. Paliwal, K.K., and So, S. (2005). “A Fractional Bit Encoding Technique for the GMM-Based Block Quantization of Images.” Digital Signal Process. 15(3): 435-446.Google Scholar
  23. Rabiner, L., and Juang, B.H. (1993). Fundamentals of Speech Recognition. Prentice Hall, New Jersey.Google Scholar
  24. Ramaswamy, G.N., and Gopalakrishnan, P.S. (1998). Compression of Acoustic Features for Speech Recognition in Network Environments. IEEE Int. Conf. Acoust. SpeechSignal Process.Google Scholar
  25. Segall, A. (1976). “Bit Allocation and Encoding of Vector Sources.” IEEE Trans. Inform. Theory IT-22(2): 162-169.CrossRefMathSciNetGoogle Scholar
  26. So, S., and Paliwal, K.K. (2005). Improved Noise-Robustness in Distributed Speech Recogni-tion via Perceptually-Weighted Vector Quantisation of Filterbank Energies. Eur. Conf. Speech Commun. Technol., Lisbon, Portugal.Google Scholar
  27. So, S., and Paliwal, K.K. (2006). “Scalable Distributed Speech Recognition Using Gaussian Mixture Model-Based Block Quantisation.” Speech Commun. 48: 746-758.CrossRefGoogle Scholar
  28. Srinivasamurthy, N., Ortega, A., and Narayanan, S. (2006). “Efficient Scalable Encoding for Distributed Speech Recognition.” Speech Commun. 48(8): 888-902.CrossRefGoogle Scholar
  29. Strope, B., and Alwan, A. (1997). “A Model of Dynamic Auditory Perception and its Applica-tion to Robust Word Recognition.” IEEE Trans. Speech Audio Process. 5(2): 451-464.CrossRefGoogle Scholar
  30. Subramaniam, A.D., and Rao, B.D. (2003). “PDF Optimized Parametric Vector Quantization of Speech Line Spectral Frequencies.” IEEE Trans. Speech Audio Process. 11(2): 130-142.CrossRefGoogle Scholar
  31. Tsao, C., and Gray, R.M. (1985). “Matrix Quantizer Design for LPC Speech using the Gener-alized Lloyd Algorithm.” IEEE Trans. Acoust. Speech Signal Process. 33: 537-545.CrossRefGoogle Scholar
  32. Wallace, G.K. (1991). “The JPEG Still Picture Compression Standard.” Commun. ACM 34 (4): 30-44.CrossRefGoogle Scholar
  33. Young, S., Evermann, G., Hain, T., Kershaw, D., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., and Woodland, P. (2002). The HTK Book (for HTK Version 3.2.1). Cam-bridge University Engineering Department.Google Scholar
  34. Zhu, Q., and Alwan, A. (2000). On the Use of Variable Frame Rate Analysis in Speech Rec-ognition. IEEE Int. Conf. Acoust. Speech Signal Process.Google Scholar
  35. Zhu, Q., and Alwan, A. (2001). An Efficient and Scalable 2D DCT-Based Feature Coding Scheme for Remote Speech Recognition. IEEE Int. Conf. Acoust. Speech Signal Process.Google Scholar

Copyright information

© Springer-Verlag London Limited 2008

Authors and Affiliations

  • Stephen So
    • 1
  • Kuldip K. Paliwal
    • 1
  1. 1.Griffith School of Engineering, Signal Processing LaboratoryGriffith UniversityAustralia

Personalised recommendations