Speech Recognition Over Mobile Networks

  • Hong Kook Kim
  • Richard C. Rose
Part of the Advances in Pattern Recognition book series (ACVPR)

This chapter addresses issues associated with automatic speech recognition (ASR) over mobile networks, and introduces several techniques for improving speech recognition performance. One of these issues is the performance degradation of ASR over mobile networks that results from distortions produced by speech coding algorithms employed in mobile communication systems, transmission errors occurring over mobile telephone channels, and ambient background noise that can be particularly severe in mobile domains. In particular, speech coding algorithms have difficulty in modeling speech in ambient noise environments. To overcome this problem, noise reduction techniques can be integrated into speech coding algorithms to improve reconstructed speech quality under ambient noise conditions, or speech coding parameters can be made more robust with respect to ambient noise. As an alternative to mitigating the effects of speech coding distortions in the received speech signal, a bitstream-based framework has been proposed. In this framework, the direct transformation of speech coding parameters to speech recognition parameters is performed as a means of improving ASR performance. Furthermore, it is suggested that the receiver-side enhancement of speech coding parameters can be performed using either an adaptation algorithm or model compensation. Finally, techniques for reducing the effects of channel errors are also discussed in this chapter. These techniques include frame erasure concealment for ASR, soft-decoding, and missing feature theory-based ASR decoding.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Atal, B. S. (1980). Predictive coding of speech at low bit rates. IEEE Transactions on Com-munication, vol. 30, no. 4, pp. 600-614.CrossRefGoogle Scholar
  2. Barcaroli, L., Linares, G., Costa, J. -P. and Bonastre, J. -F. (2005). Nonlinear GSM echo can-cellation: Application to speech recognition. In Proceedings of ISCA Tutorial and Research Workshop on Non-linear Speech Processing, paper 021.Google Scholar
  3. Bernard, A. and Alwan, A. (2001a). Joint channel decoding—Viterbi recognition for wireless applications. In Proceedings Eurospeech, pp. 2213-2216.Google Scholar
  4. Bernard, A. and Alwan, A. (2001b). Source and channel coding for remote speech recognition over error-prone channels. In Proceedings of ICASSP, pp. 2613-2616.Google Scholar
  5. Bernard, A. and Alwan, A. (2002). Channel noise robustness for low-bitrate remote speech recognition. In Proceedings of ICASSP, pp. 2213-2216.Google Scholar
  6. Chang, H. M. (2000). Is ASR ready for wireless primetime: Measuring the core technology for selected applications. Speech Communication, vol. 31, no. 4, pp. 293-307.CrossRefGoogle Scholar
  7. Choi, S. H., Kim, H. K., Kim, S. R., Cho, Y. D. and Lee, H. S. (1999). Performance evalua-tion of speech coders for speech recognition in adverse communication environments. In Proceedings of ICCE, pp. 318-319.Google Scholar
  8. Choi, S. H., Kim, H. K. and Lee, H. S. (2000). Speech recognition using quantized LSP parameters and their transformations in digital communication. Speech Communication, vol. 30, no. 4, pp. 223-233.CrossRefMathSciNetGoogle Scholar
  9. Cox, R. V., Kamm, C. A., Rabiner, L. R., Schroeter, J. and Wilpon, J. G. (2000). Speech and language processing for next-millennium communications services. Proceedings of the IEEE, vol. 88, no. 8, pp. 1314-1337.CrossRefGoogle Scholar
  10. Davis, S. B. and Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics Speech and Signal Processing, vol. 28, no. 4, pp. 357-366.CrossRefGoogle Scholar
  11. de Martin, J. C., Unno, T. and Viswanathan, V. (2000). Improved frame erasure concealment for CELP-based coders. In Proceedings of ICASSP, pp. 1483-1486.Google Scholar
  12. Dufour, S., Glorion, C. and Lockwood, P. (1996). Evaluation of the root-normalized front-end (RN_LFCC) for speech recognition in wireless GSM network environments. In Proceed-ings of ICASSP, pp. 77-80.Google Scholar
  13. Euler, S. and Zinke, J. (1994). The influence of speech coding algorithms on automatic speech recognition. In Proceedings of ICASSP, pp. 621-624.Google Scholar
  14. Fabregas, V., de Alencar, S. and Alcaim, A. (2005). Transformations of LPC and LSF parameters to speech recognition features. Lecture Notes in Computer Sciences, vol. 3686, pp. 522-528.CrossRefGoogle Scholar
  15. Fingscheidt, T., Aalbury, S., Stan, S. and Beaugeant, C. (2002). Network-based versus dis-tributed speech recognition in adaptive multi-rate wireless systems. In Proceedings of ICSLP, pp. 2209-2212.Google Scholar
  16. Gallardo-Antolín, A., Díaz-de-María, F. and Valverde-Albacete, F. (1998). Recognition from GSM digital speech. In Proceedings of ICSLP, pp. 1443-1446.Google Scholar
  17. Gallardo-Antolín, A., Peláez-Moreno, C. and Díaz-de-María, F. (2005). Recognizing GSM digital speech. IEEE Transactions on Speech and Audio Processing, vol. 13, no. 6, pp. 1186-1205.CrossRefGoogle Scholar
  18. Gómez, M., Peinado, A. M., Sánchez, V. and Rubo, A. J. (2006). Recognition of coded speech transmitted over wireless channels. IEEE Transactions on Wireless Communica-tions, vol. 5, no. 9, pp. 2555-2562.CrossRefGoogle Scholar
  19. Gurgen, F. S., Sagayama, S. and Furui, S. (1990). Line spectrum frequency-based distance measure for speech recognition. In Proceedings of ICSLP, pp. 521-524.Google Scholar
  20. Hermansky, H. (1990). Perceptual linear predictive (PLP) analysis of speech. The Journal of the Acoustical Society of America, vol. 87, no. 4, pp. 1738-1752.CrossRefGoogle Scholar
  21. Honkanen, T., Vainio, J., Järvinen, K., Haavisto, P., Salami, R., Laflamme, C. and Adoul, J. -P. (1997). Enhanced full rate speech codec for IS-136 digital cellular system. In Proceedings of ICASSP, pp. 731-734.Google Scholar
  22. Huerta, J. M. and Stern, R. M. (1998). Speech recognition from GSM codec parameters. In Pro-ceedings of ICSLP, pp. 1463-1466.Google Scholar
  23. ITU-T Recommendation G.729 (1996). Coding of speech at 8 kbit/s using conjugate-structure algebraic-code-excited linear-prediction (CS-ACELP). March.Google Scholar
  24. Juang, B.-H., Rabiner, L. R. and Wilpon, J. G. (1987). On the use of bandpass liftering in speech recognition. IEEE Transactions on Acoustics Speech and Signal Processing, vol. 35, no. 7, pp. 947-954.CrossRefGoogle Scholar
  25. Junqua, J. -C., Wakita, H. and Hermansky, H. (1993). Evaluation and optimization of percep-tually-based ASR front-end. IEEE Transactions on Speech and Audio Processing, vol. 1, no. 1, pp. 39-48.CrossRefGoogle Scholar
  26. Karray, L., Jelloun, A. B. and Mokbel, C. (1998). Solutions for robust recognition over the GSM cellular network. In Proceedings of ICASSP, pp. 261-264.Google Scholar
  27. Kataoka, A. and Hayashi, S. (2007). A cryptic encoding method for G.729 using variation in bit-reversal sensitivity. Electronics and Communications in Japan (Part III: Fundamental Electronic Science), vol. 90, no. 2, pp. 63-71.CrossRefGoogle Scholar
  28. Kim, H. K., Choi, S. H. and Lee, H. S. (2000). On approximating line spectral frequencies to LPC cepstral coefficients. IEEE Transactions on Speech and Audio Processing, vol. 8, no. 2, pp. 195-199.CrossRefGoogle Scholar
  29. Kim, H. K. and Cox, R. V. (2001). A bitstream-based front-end for wireless speech recogni-tion on IS-136 communications system. IEEE Transactions on Speech and Audio Process-ing, vol. 9, no. 5, pp. 558-568.CrossRefGoogle Scholar
  30. Kim, H. K., Cox, R. V. and Rose, R. C. (2002). Performance improvement of a bitstream-based front-end for wireless speech recognition in adverse environments. IEEE Transac-tions on Speech and Audio Processing, vol. 10, no. 8, pp. 591-604.CrossRefGoogle Scholar
  31. Kim, H. K. (2004). Compensation of speech coding distortion for wireless speech recognition. IEICE Transactions on Information and Systems, vol. E87-D, no. 6, pp. 1596-1600.Google Scholar
  32. Lee, L.-S. and Lee, Y. (2001). Voice access of global information for broad-band wireless: Technologies of today and challenges of tomorrow. Proceedings of the IEEE, vol. 89, no. 1, pp. 41-57.CrossRefGoogle Scholar
  33. Lilly, B. T. and Paliwal, K. K. (1996). Effect of speech coders on speech recognition per-formance. In Proceedings of ICSLP, pp. 2344-2347.Google Scholar
  34. Milner, B. and Semnani, S. (2000). Robust speech recognition over IP networks. In Proceed-ings of ICASSP, pp. 1791-1794.Google Scholar
  35. Mohan, A. (2001). A strategy for voice browsing in 3G wireless networks. In Proceedings of EUROCON, pp. 120-123.Google Scholar
  36. Nakano, H. (2001). Speech interfaces for mobile communications. In Proceedings of ASRU, pp. 93-95.Google Scholar
  37. Nour-Eldin, A. H., Tolba, H. and O’Shaughnessy, D. (2004). Automatic recognition of Blue-tooth speech in 802.11 interference and the effectiveness of insertion-based compensation techniques. In Proceedings of ICASSP, pp. 1033-1036.Google Scholar
  38. Oppenheim, A. V. and Johnson, D. H. (1972). Discrete representation of signals. Proceedings of the IEEE, vol. 60, no. 6, pp. 681-691.CrossRefGoogle Scholar
  39. Paliwal, K. K. (1988). A perception-based LSP distance measure for speech re-cognition. The Journal of the Acoustical Society of America, vol. 84, no. S1, pp. S14-S15.CrossRefMathSciNetGoogle Scholar
  40. Peláez-Moreno, C., Gallardo-Antolín, A. and Díaz-de-María, F. (2001). Recognizing voice over IP: A robust front-end for speech recognition on the World Wide Web. IEEE Trans-actions on Multimedia, vol. 3, no. 2, pp. 209-218.CrossRefGoogle Scholar
  41. Peláez-Moreno, C., Gallardo-Antolín, A., Gómez-Cajas, D. F. and Díaz-de-María, F. (2006). A comparison of front-ends for bitstream-based ASR over IP. Signal Processing, vol. 86, no. 7, pp. 1502-1508.MATHCrossRefGoogle Scholar
  42. Rabiner, L. R. (1997). Applications of speech recognition in the area of telecommunications. In Proceedings of ASRU, pp. 501-510.Google Scholar
  43. Rose, R. C., Parthasarathy, S., Gajic, B., Rosenberg, A. E. and Narayanan, S. (2001). On the implementation of ASR algorithms for hand-held wireless mobile devices. In Proceedings of ICASSP, pp. 17-20.Google Scholar
  44. Schroeder, M. R. (1981). Direct (nonrecursive) relations between cepstrum and predictor coefficient. IEEE Transactions on Acoustics Speech and Signal Processing, vol. 29, no. 2, pp. 297-301.MATHCrossRefGoogle Scholar
  45. Servetti, A. and de Martin, J. C. (2002). Perception-based partial encryption of compressed speech. IEEE Transactions on Speech and Audio Processing, vol. 10, no. 8, pp. 637-643.CrossRefGoogle Scholar
  46. Siu, M. and Chan, A. (2006). A robust Viterbi algorithm against impulsive noise with applica-tion to speech recognition. IEEE Transactions on Audio Speech and Language Processing, vol. 14, no. 6, pp. 2122-2133.CrossRefMathSciNetGoogle Scholar
  47. Sollenberger, N. R., Seshadri, N. and Cox, R. (1999). The evolution of IS-136 TDMA for third-generation wireless services. IEEE Personal Communications, vol. 6, no. 3, pp. 8-18.CrossRefGoogle Scholar
  48. Sukkar, R. A., Chengalvarayan, R. and Jacob, J. J. (2002). Unified speech recognition for the landline and wireless environments. In Proceedings of ICASSP, pp. 293-296.Google Scholar
  49. Tan, Z. -H., Dalsgaard, P. and Lindberg, B. (2005). Automatic speech recognition over error-prone wireless networks. Speech Communication, vol. 47, nos. 1-2, pp. 220-242.CrossRefGoogle Scholar
  50. Tan, Z. -H., Dalsgaard, P. and Lindberg, B. (2007). Exploiting temporal correlation of speech for error robust and bandwidth flexible distributed speech recognition. IEEE Transactions on Audio Speech and Language Processing, vol. 15, no. 4, pp. 1391-1403.CrossRefGoogle Scholar
  51. Tohkura, Y., Itakura, F. and Hashimoto, S. (1978). Spectral smoothing technique in PARCOR speech analysis-synthesis. IEEE Transactions on Acoustics Speech and Signal Processing, vol. 26, no. 6, pp. 587-596.CrossRefGoogle Scholar
  52. Vicente-Pena, J., Gallardo-Antolín, A., Peláez-Moreno, C. and Díaz-de-María, F. (2006). Band-pass filtering of the time sequences of spectral parameters for robust wireless speech recognition. Speech Communication, vol. 48, no. 10, pp. 1379-1398.CrossRefGoogle Scholar
  53. Wah, B. W., Su, X. and Lin, D. (2000). A survey of error-concealment schemes for real-time audio and video transmissions over the Internet. In Proceedings of IEEE International Symposium on Multimedia Software Engineering, pp. 17-24.Google Scholar
  54. Wang, L., Kitaoka, N. and Nakagawa, S. (2005). Robust distance speech recognition based on position dependent CMN using a novel multiple microphone processing technique. In Proceedings Eurospeech, pp. 2661-2664.Google Scholar
  55. Wölfel, M. and McDonough, J. (2005). Minimum variance distortionless response spectral estimation. IEEE Signal Processing Magazine, vol. 22, no. 5, pp. 117-126.CrossRefGoogle Scholar
  56. Yu, A. T. and Wang, H. C. (1998). A study on the recognition of low bit-rate encoded speech. In Proceedings of ICSLP, pp. 1523-1526.Google Scholar
  57. Yu, A. T. and Wang, H. C. (2003). Channel effect compensation in LSF domain. EURASIP Journal on Applied Signal Processing, vol. 2003, no. 9, pp. 922-929.MATHCrossRefGoogle Scholar
  58. Zhang, H. and Xu, J. (2006). Pattern-based dynamic compensation towards robust speech recognition in mobile environments. In Proceedings of ICASSP, pp. 1129-1132.Google Scholar
  59. Zheng, F., Song, Z., Li, L., Yu, W., Zheng, F. and Wu, W. (1988). The distance measure for line spectrum pairs applied to speech recognition. In Proceedings of ICSLP, pp. 1123-1126.Google Scholar

Copyright information

© Springer-Verlag London Limited 2008

Authors and Affiliations

  • Hong Kook Kim
    • 1
  • Richard C. Rose
    • 2
  1. 1.Department of Information and CommunicationsGwangju Institute of Science and TechnologyGwangjuKorea
  2. 2.Department of Electrical and Computer EngineeringMcGill UniversityMontrealCanada

Personalised recommendations