Skip to main content

Part of the book series: Advances in Pattern Recognition ((ACVPR))

In this chapter, we describe various schemes for quantizing speech features to be used in distributed speech recognition (DSR) systems. We analyze the statistical properties of Mel frequency-warped cepstral coefficients (MFCCs) that are most relevant to quantization, namely the correlation and probability density function shape, in order to determine the type of quantization scheme that would be most suitable for quantizing them efficiently. We also determine empirically the relationship between mean squared error and recognition accuracy in order to verify that quantization schemes, which minimize mean squared error, are also guaranteed to improve the recognition performance. Furthermore, we highlight the importance of noise robustness in DSR and describe the use of a perceptually weighted distance measure to enhance spectral peaks in vector quantization. Finally, we present some experimental results on the quantization schemes in a DSR framework and compare their relative recognition performances.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • ETSI Standard ES 201 108 (2003). Speech Processing, Transmission and Quality Aspects (STQ); Distributed Speech Recognition; Front-End Feature Extraction Algorithm; Com-pression Algorithms. Tech. Rep. Standard ES 201 108 v1.1.3, European Telecommunica-tions Standards Institute (ETSI).

    Google Scholar 

  • Chen, W., and Smith, C.H. (1977). “Adaptive Coding of Monochrome and Color Images.” IEEE Trans. Commun. COM-25(11): 1285-1292.

    Article  Google Scholar 

  • Davis, S.B., and Mermelstein, P. (1980). “Comparison of Parametric Representations of Monosyllabic Word Recognition in Continuously Spoken Sentences.” IEEE Trans. Acoust. Speech Signal Process. ASSP-28(4): 357-366.

    Article  Google Scholar 

  • Dempster, A.P., Laird, N.M., and Rubin, D.B. (1977). “Maximum Likelihood from Incom-plete Data via the EM Algorithm.” J. Roy. Stat. Soc. 39: 1-38.

    MATH  MathSciNet  Google Scholar 

  • Digalakis, V.V., Neumeyer, L.G., and Perakakis, M. (1999). “Quantization of Cepstral Pa-rameters for Speech Recognition over the World Wide Web.” IEEE J. Select. Areas Commun. 17(1): 82-90.

    Article  Google Scholar 

  • Gersho, A., and Gray, R.M. (1992). Vector Quantization and Signal Compression. Kluwer Academic Publishers, Massachusetts.

    MATH  Google Scholar 

  • Gray, R.M., and Neuhoff, D.L. (1998). “Quantization.” IEEE Trans. Inform. Theory 44(6): 2325-2383.

    Article  MATH  MathSciNet  Google Scholar 

  • Hirsch, H.G., and Pearce, D. (2000). The Aurora Experimental Framework for the Perform-ance Evaluation of Speech Recognition Systems Under Noisy Conditions. ISCA ITRW ASR2000, Paris, France.

    Google Scholar 

  • Huang, J.J.Y., and Schultheiss, P.M. (1963). “Block Quantization of Correlated Gaussian Random Variables.” IEEE Trans. Commun. CS-11: 289-296.

    Article  Google Scholar 

  • Juang, B.H., Rabiner, L.R., and Wilpon, J.G. (1987). “On the Use of Bandpass Liftering for Speech Recognition.” IEEE Trans. Acoust. Speech Signal Process. 1: 597-600.

    Google Scholar 

  • Kiss, I., and Kapanen, P. (1999). Robust Feature Vector Compression Algorithm for Distrib-uted Speech Recognition. Eur. Conf. Speech Commun. Technol..

    Google Scholar 

  • Leonard, R.G. (1984). A Database for Speaker-Independent Digit Recognition. Proc. IEEE. Int. Conf. Acoust. Speech Signal Process.

    Google Scholar 

  • Linde, Y., Buzo, A., and Gray, R.M. (1980). “An Algorithm for Vector Quantizer Design.” IEEE Trans. Commun. 28(1): 84-95.

    Article  Google Scholar 

  • Lloyd, S.P. (1982). “Least Square Quantization in PCM.” IEEE Trans. Inform. Theory IT-28(2): 129-137.

    Article  MathSciNet  Google Scholar 

  • Lookabaugh, T.D., and Gray, R.M. (1989). “High-Resolution Quantization Theory and the Vector Quantizer Advantage.” IEEE Trans. Inform. Theory 35(5): 1020-1033.

    Article  MathSciNet  Google Scholar 

  • Makhoul, J., Roucos, S., and Gish, H. (1985). “Vector Quantization in Speech Coding.” Proc. IEEE 73: 1551-1588.

    Article  Google Scholar 

  • Max, J. (1960). “Quantizing for Minimum Distortion.” IRE Trans. Inform. Theory IT-6: 7-12.

    Article  MathSciNet  Google Scholar 

  • Paez, M.D., and Glisson, T.H. (1972). “Minimum Mean-Squared-Error Quantization in Speech PCM and DPCM System.” IEEE Trans. Commun. COM-20: 225-230.

    Article  Google Scholar 

  • Paliwal, K.K. (1982). “On the Performance of the Quefrency-Weighted Cepstral Coefficients in Vowel Recognition.” Speech Commun. 1: 151-154.

    Article  Google Scholar 

  • Paliwal, K.K. (1999). Decorrelated and Liftered Filterbank Energies for Robust Speech Rec-ognition. Eur. Conf. Speech Commun. Technol., Budapest, Hungary.

    Google Scholar 

  • Paliwal, K.K., and Atal, B.S. (1993). “Efficient Vector Quantization of LPC Parameters at 24 Bits/Frame.” IEEE Trans. Speech Audio Process. 1(1): 3-14.

    Article  Google Scholar 

  • Paliwal, K.K., and So, S. (2005). “A Fractional Bit Encoding Technique for the GMM-Based Block Quantization of Images.” Digital Signal Process. 15(3): 435-446.

    Google Scholar 

  • Rabiner, L., and Juang, B.H. (1993). Fundamentals of Speech Recognition. Prentice Hall, New Jersey.

    Google Scholar 

  • Ramaswamy, G.N., and Gopalakrishnan, P.S. (1998). Compression of Acoustic Features for Speech Recognition in Network Environments. IEEE Int. Conf. Acoust. SpeechSignal Process.

    Google Scholar 

  • Segall, A. (1976). “Bit Allocation and Encoding of Vector Sources.” IEEE Trans. Inform. Theory IT-22(2): 162-169.

    Article  MathSciNet  Google Scholar 

  • So, S., and Paliwal, K.K. (2005). Improved Noise-Robustness in Distributed Speech Recogni-tion via Perceptually-Weighted Vector Quantisation of Filterbank Energies. Eur. Conf. Speech Commun. Technol., Lisbon, Portugal.

    Google Scholar 

  • So, S., and Paliwal, K.K. (2006). “Scalable Distributed Speech Recognition Using Gaussian Mixture Model-Based Block Quantisation.” Speech Commun. 48: 746-758.

    Article  Google Scholar 

  • Srinivasamurthy, N., Ortega, A., and Narayanan, S. (2006). “Efficient Scalable Encoding for Distributed Speech Recognition.” Speech Commun. 48(8): 888-902.

    Article  Google Scholar 

  • Strope, B., and Alwan, A. (1997). “A Model of Dynamic Auditory Perception and its Applica-tion to Robust Word Recognition.” IEEE Trans. Speech Audio Process. 5(2): 451-464.

    Article  Google Scholar 

  • Subramaniam, A.D., and Rao, B.D. (2003). “PDF Optimized Parametric Vector Quantization of Speech Line Spectral Frequencies.” IEEE Trans. Speech Audio Process. 11(2): 130-142.

    Article  Google Scholar 

  • Tsao, C., and Gray, R.M. (1985). “Matrix Quantizer Design for LPC Speech using the Gener-alized Lloyd Algorithm.” IEEE Trans. Acoust. Speech Signal Process. 33: 537-545.

    Article  Google Scholar 

  • Wallace, G.K. (1991). “The JPEG Still Picture Compression Standard.” Commun. ACM 34 (4): 30-44.

    Article  Google Scholar 

  • Young, S., Evermann, G., Hain, T., Kershaw, D., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., and Woodland, P. (2002). The HTK Book (for HTK Version 3.2.1). Cam-bridge University Engineering Department.

    Google Scholar 

  • Zhu, Q., and Alwan, A. (2000). On the Use of Variable Frame Rate Analysis in Speech Rec-ognition. IEEE Int. Conf. Acoust. Speech Signal Process.

    Google Scholar 

  • Zhu, Q., and Alwan, A. (2001). An Efficient and Scalable 2D DCT-Based Feature Coding Scheme for Remote Speech Recognition. IEEE Int. Conf. Acoust. Speech Signal Process.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag London Limited

About this chapter

Cite this chapter

So, S., Paliwal, K.K. (2008). Quantization of Speech Features: Source Coding. In: Automatic Speech Recognition on Mobile Devices and over Communication Networks. Advances in Pattern Recognition. Springer, London. https://doi.org/10.1007/978-1-84800-143-5_7

Download citation

  • DOI: https://doi.org/10.1007/978-1-84800-143-5_7

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-84800-142-8

  • Online ISBN: 978-1-84800-143-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics