There are two main requirements for embedded/mobile systems: one is low power consumption for long battery life and miniaturization, the other is low unit cost for components produced in very large numbers (cell phones, set-top boxes). Both requirements are addressed by CPU’s with integer-only arithmetic units which motivate the fixed-point arithmetic implementation of automatic speech recognition (ASR) algorithms. Large vocabulary continuous speech recognition (LVCSR) can greatly enhance the usability of devices, whose small size and typical on-the-go use hinder more traditional interfaces. The increasing computational power of embedded CPU’s will soon allow real-time LVCSR on portable and lowcost devices. This chapter reviews problems concerning the fixed-point implementation of ASR algorithms and it presents fixed-point methods yielding the same recognition accuracy of the floating-point algorithms. In particular, the chapter illustrates a practical approach to the implementation of the frame-synchronous beam-search Viterbi decoder, N-grams language models, HMM likelihood computation and mel-cepstrum front-end. The fixed-point recognizer is shown to be as accurate as the floating-point recognizer in several LVCSR experiments, on the DARPA Switchboard task, and on an AT&T proprietary task, using different types of acoustic front-ends, HMM’s and language models. Experiments on the DARPA Resource Management task, using the StrongARM-1100 206 MHz and the XScale PXA270 624 MHz CPU’s show that the fixed-point implementation enables real-time performance: the floating point recognizer, with floating-point software emulation is several times slower for the same accuracy.

## Preview

Unable to display preview. Download preview PDF.

### References

- Bocchieri, E. and Mak, B. (2001) Subspace distribution clustering hidden Markov model. IEEE Transactions on ASSP, vol. 9, pp. 264-275.Google Scholar
- Davis, S.B. and Mermelstein, P. (1980) Comparison of parametric representations for mono-syllabic word recognition in continuously spoken sentences. IEEE Transactions on ASSP, vol. ASSP-28, no. 4, pp. 357-366.CrossRefGoogle Scholar
- Gong, Y. and Kao, Y. (2000) Implementing a high accuracy speaker-independent Continuous speech recognizer on a fixed-point DSP. In Proceedings of ICASSP, pp. 3686-3689.Google Scholar
- Hermansky, H. and Morgan, N. (1994) Rasta processing of speech. IEEE Transaction on ASSP, vol. 6, pp. 578-589.Google Scholar
- Huggins-Daines, D., Kumar, M., Chan, A., Black, A.W., Ravishankar, M. and Rudnicky, A.I. (2006) Pocketsphinx: A free, real-time continuous speech recognition system for hand-held devices. In Proceedings of ICASSP, vol. 1, pp. 185-188.Google Scholar
- Jeong, J., Han, I., Jon, E. and Kim, J. (2004) Memory and computation reduction for embed-ded ASR systems. In Proceedings of ICSLP.Google Scholar
- Kanthak, S., Schütz, K. and Ney, H. (2000) Using SIMD instructions for fast likelihood calcu-lation in LVCSR. In Proceedings of ICASSP, pp. 1531-1534.Google Scholar
- Kao, Y.H. and Rajasekaran, P.K. (2000) A low cost dynamic vocabulary speechrecognizer on a GPP-DSP system. In Proceedings of ICASSP, pp. 3215-3218.Google Scholar
- Köhler, T., Fügen, C., Stüker, S. and Waibel, A. (2005) Rapid porting of ASR systems to mobile devices. In Proceedings of INTERSPEECH, pp. 233-236.Google Scholar
- Lee, K.F. (1989). Automatic Speech Recognition Recognition. The Development of the SPHINX System, Kluwer Academic.Google Scholar
- Lee, L. and Rose, R.C. (1996) Speaker normalization using efficient frequency warping pro-cedures. In Proceedings of ICASSP, vol. 1, pp. 353-356.Google Scholar
- Leppänen, J. and Kiss, I. (2005) Comparison of low foot-print acoustic modeling techniques for embedded ASR studies. In Proceedings of INTERSPEECH, pp. 2965-2968.Google Scholar
- Li, X., Malkin, J. and Bilmes, J. (2006) A high-speed, low-resource ASR back-end based on custom arithmetic. IEEE Transaction on Speech and Audio Processing, vol. 14, issue 5, pp. 1683-1693.CrossRefGoogle Scholar
- Mohri, M., Pereira, F. and Riley, M. (2002) Weighted finite-state transducers in speech recog-nition. Computer, Speech and Language, vol. 16 issue 1, pp. 69-99.CrossRefGoogle Scholar
- Novak, M. (2004) Towards large vocabulary ASR on embedded platforms. In Proceedings of ICSLP.Google Scholar
- Novak, M., Hampl, R., Krbec, P. and Sedivy, J. (2003) Two-pass search strategy for large list recognition on embedded speech recognition platforms. In Proceedings of ICASSP, vol. 1, pp. 200-203.Google Scholar
- Oppenheim, A.V. and Schafer, R.W. (1975) Digital signal processing, Prentice-Hall.Google Scholar
- Rose, R., Parthasarathy, S., Gajic, B., Rosenberg, A. and Narayanan S. (2001) On the imple-mentation of ASR algorithms for hand-held wireless mobile devices. In Proceedings of ICASSP, vol. 1, pp. 17-20.Google Scholar
- Sagayama, S. and Takahashi, S. (1995) On the use of scalar quantization for fast HMM com-putation. In Proceedings of ICASSP, Vol. 1, pp. 213-216.Google Scholar
- Saon, G., Padmanabhan, M., Gopinath, R., and Chen, S. (2000) Maximum likelihood dis-criminant feature spaces. In Proceedings of ICASSP, vol. 2, pp. 1129-1131.Google Scholar
- Vasilache, M. (2000) Speech recognition using HMM’s with quantized parameters. In Pro-ceedings of ICSLP, vol. 1, pp. 441-444.Google Scholar
- Vasilache, M., Iso-Sipilä, J. and Viikki, O. (2004) On a practical design of a ow complexity speech recognition engine. In Proceedings of ICASSP, vol. 5, pp. V-113-16.Google Scholar
- Viikki, O. (2001) ASR in portable wireless devices. In Proceedings of ASRU, pp. 96-99.Google Scholar
- Zaykovskiy, D. (2006) Survey of the speech recognition techniques for mobile devices. In Proceedings of 11th International Conference Speech and Computer, SPECOM’2006, pp. 88-92.Google Scholar