In distributed and network speech recognition the actual recognition task is not carried out on the user’s terminal but rather on a remote server in the network. While there are good reasons for doing so, a disadvantage of this client-server architecture is clearly that the communication medium may introduce errors, which then impairs speech recognition accuracy. Even sophisticated channel coding cannot completely prevent the occurrence of residual bit errors in the case of temporarily adverse channel conditions, and in packet-oriented transmission packets of data may arrive too late for the given real-time constraints and have to be declared lost. The goal of error concealment is to reduce the detrimental effect that such errors may induce on the recipient of the transmitted speech signal by exploiting residual redundancy in the bit stream at the source coder output. In classical speech transmission a human is the recipient, and erroneous data are reconstructed so as to reduce the subjectively annoying effect of corrupted bits or lost packets. Here, however, a statistical classifier is at the receiving end, which can benefit from knowledge about the quality of the reconstruction. In this book chapter we show how the classical Bayesian decision rule needs to be modified to account for uncertain features, and illustrate how the required feature posterior density can be estimated in the case of distributed speech recognition. Some other techniques for error concealment can be related to this approach. Experimental results are given for both a small and a medium vocabulary recognition task and both for a channel exhibiting bit errors and a packet erasure channel.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Arrowood, J.A. and Clements, M.A. (2002). Using observation uncertainty in HMM decod-ing. In Proc. ICSLP, Denver, Colorado.Google Scholar
  2. Bahl, L., Cocke, J., Jelinek, F. and Raviv, J. (1974). Optimal decoding of linear codes for minimizing symbol error rate, IEEE Trans. Inf. Theory, vol. 10, pp. 284-287.CrossRefMathSciNetGoogle Scholar
  3. Bernard, A. and Alwan, A. (2001). Joint channel decoding—Viterbi recognition for wireless applications. In Proc. Eurospeech, Aalborg, Denmark.Google Scholar
  4. Bernard, A. and Alwan, A. (2002). Low-bitrate distributed speech recognition for packet-based and wireless communication. IEEE Trans. Speech and Audio Process., vol. 10, no. 8, Nov., 2002.Google Scholar
  5. Boulis, C., Ostendorf, M., Riskin, E.A. and Otterson, S. (2002). Graceful degradation of speech recognition performance over packet-erasure networks. IEEE Trans. on Speech and Audio Processing, vol. 10, no. 8, Nov. pp. 580-590.CrossRefGoogle Scholar
  6. Cardenal-López, A., García-Mateo, C. and Docío-Fernández, L. (2006). Weighted Viterbi decoding strategies for distributed speech recognition over IP networks, Speech Commu-nication, vol. 48, no. 11, Nov., pp. 1422-1434.CrossRefGoogle Scholar
  7. COST 207 (1989). Digital land mobile radio communication—Final report. Office for offi-cial publications of the European Communities, Luxembourg.Google Scholar
  8. Cox, R.V., Kleijn, W.B. and Kroon, P. (1989). Robust CELP coders for noisy backgrounds and noisy channels. In Proc. IEEE Int. Conf. Acoust. Speech Signal Process., 1989, pp. 739-742.Google Scholar
  9. Davis, S.B. and Mermelstein P. (1980). Comparison of parametric representations for mono-syllabic word recognition in continuously spoken sentences. IEEE Trans. on Acoust. Speech and Signal Process., vol. 28, pp. 357-366.CrossRefGoogle Scholar
  10. Droppo, J., Acero, A. and Deng, L. (2002). Uncertainty decoding with Splice for noise robust speech recognition. In Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Orlando, Florida.Google Scholar
  11. Endo, T., Kuroiwa, S. and Nakamura, S. (2003). Missing feature theory applied to robust speech recognition over IP networks. In Proc. Eurospeech, Geneva, Switzerland. ETSI Standard ES 202 050 (2002). Speech processing, transmission and quality aspects (STQ); distributed speech recognition; advanced front-end feature extraction algorithm; compression algorithms. v1.1.1, Oct.Google Scholar
  12. ETSI Standard ES 201 108 (2003a). Speech processing, transmission and quality aspects (STQ);distributed speech recognition; front-end feature extraction algorithm; compression algorithms. v1.1.3, Sep.Google Scholar
  13. ETSI Standard TS 100 909 v8.7.1 (2003b). Digital cellular telecommunications system (phase 2+); channel coding. (3GPP TS 05.03 version 8.7.0; Release 1999).Google Scholar
  14. Fingscheidt, T., Aalburg, S., Stan, T. and Beaugeant, C. (2002). Network-based versus distrib-uted speech recognition in adaptive multi-rate wireless systems. In Proc. Int. Conf. on Spoken Language Proc., Denver.Google Scholar
  15. Fingscheidt, T. and Vary, P. (2001). Softbit speech decoding: A new approach to error con-cealment. IEEE Trans. Speech and Audio Proc., vol. 9, no. 3, March, pp. 1-11.Google Scholar
  16. Gómez, A.M., Peinado, A.M., Sánchez, V. and Rubio, J. (2007). On the Ramsey class of interleavers for robust speech recognition in burst-like packet loss, IEEE Trans. Audio Speech and Lang. Process., vol. 15, no. 4, May, pp. 1496-1499.CrossRefGoogle Scholar
  17. GSM 06.11 Recommendation (1992). Substitution and muting of lost frames for full rate speech traffic channels. ETSI TC-SMG.Google Scholar
  18. Haeb-Umbach, R. and Ion, V. (2004). Soft features for improved distributed speech recogni-tion over wireless networks. In Proc. ICSLP, Jeju, Korea.Google Scholar
  19. Hirsch, H.G. and Pearce, D. (2000). The Aurora experimental framework for the performance evaluation ofspeech recognition systems undernoisy conditions. In Proc. ISCA ITRW Workshop ASR2000, Paris, France, pp. 181-188.Google Scholar
  20. Ion, V. and Haeb-Umbach, R. (2005). A unified probabilistic approach to error concealment for distributed speech recognition. In Proc. Interspeech, Lisbon.Google Scholar
  21. Ion, V. and Haeb-Umbach, R. (2006a). Uncertainty decoding for distributed speech recogni-tion over error-prone networks, Speech Communication 48, pp. 1435-1446.CrossRefGoogle Scholar
  22. Ion, V. and Haeb-Umbach, R. (2006b). An inexpensive packet loss compensation scheme for distributed speech recognition based on soft-features. In Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Toulouse, France.Google Scholar
  23. Ion, V. and Haeb-Umbach, R. (2006c). Improved source modeling and predictive classifica-tion for channel robust speech recognition. In Proc. Interspeech, Pittsburgh. ITU-T Recommendation G.711 Appendix I (1999). A high quality low-complexity algorithm for packet loss concealment with G.711.Google Scholar
  24. James, A.B., Gomez, A. and Milner, B.P. (2004). A comparison of packet loss compensation methods and interleaving for speech recognition in burst-like packet loss. In Proc. ICSLP, Jeju, Korea.Google Scholar
  25. Kristjansson, T.T. and Frey, B.J. (2002). Accounting for uncertainty in observations: A new paradigm for robust speech recognition. In Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Orlando, Florida.Google Scholar
  26. Lahouti, F. and Khandani, A.K. (2007). Soft reconstruction of speech in the presence of noise and packet loss. IEEE Trans. Audio Speech and Lang. Proc., vol. 15, no. 1, Jan., pp. 44-56.CrossRefGoogle Scholar
  27. Liao, H. and Gales, M.J.F. (2004). Uncertainty decoding for noise robust automatic speech recognition. Technical Report TR.499, Cambridge University Engineering Department.Google Scholar
  28. Milner, B. and Semnani, S. (2000). Robust speech recognition over IP networks. In Proc. Int. Conf. Acoust. Speech Signal Process., Istanbul, Turkey.Google Scholar
  29. Morris, A., Cooke, M. and Green, P. (1998). Some solutions to the missing feature problem in data classification, with application to noise-robust ASR. In Proc. Int. Conf. Acoust. Speech Signal Process., Seattle.Google Scholar
  30. Morris, A., Barker, J. and Bourlard, H. (2001). From missing data to maybe useful data: Soft data modeling for noise robust ASR. In Proc. WISP, vol. 6.Google Scholar
  31. Paul, D. and Baker, J. (1992). The design for the Wall Street Journal-based CSR corpus. DARPA Technical Report.Google Scholar
  32. Pearce, D. (2000). Enabling new speech driven services for mobile devices: An overview of the ETSI standards activities for distributed speech recognition front-ends. In Proc. Voice Input/Output Soc. Speech Applications Conference, May.Google Scholar
  33. Peinado, A.M., Sanchez, V., Perez-Cordoba, J.L. and de la Torre, A. (2003). HMM-based channel error mitigation and its application to distributed speech recognition. Speech Communication, 41, pp. 549-561.CrossRefGoogle Scholar
  34. Potamianos, A. and Weerackody, V. (2001). Soft-feature decoding for speech recognition over wireless channels. In Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Salt Lake City, Utah.Google Scholar
  35. RFC 2460 (1998). Internet Protocol, Version 6 (IPv6) Specification, http://www.ietf.org/rfc/ rfc2460.txt, Internet Engineering Task Force, Dec.
  36. RFC 3828 (2004). The Lightweight User Datagram Protocol (UDP-Lite), http://www.ietf.org/ rfc/rfc3828.txt, Internet Engineering Task Force, July.
  37. Tan, Z.-H., Dalsgaard, P. and Lindberg, B. (2004). A subvector-based error concealment algorithm for speech recognition over mobile networks. In Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Montreal, Quebec, Canada.Google Scholar
  38. Tan, Z.H., Dalsgaard, P. and Lindberg, B. (2005). Automatic speech recognition over error-prone wireless networks, Speech Communication, vol. 47, no. 1-2, Sep.-Oct., pp 220-242.CrossRefGoogle Scholar
  39. Vary, P. and Martin, R. (2006). Digital Speech Transmission—Enhancement, Coding and Error Concealment. John Wiley, New York.Google Scholar
  40. Weerackody, V., Reichl, W. and Potamianos, A. (2002). An error-protected speech recogni-tion system for wireless communications. IEEE Trans. on Wireless Communications, vol. 1, no. 2, April, pp. 282-291.CrossRefGoogle Scholar
  41. Young, S.J. et al. (2004). HTK: Hidden Markov Model Toolkit V3.2.1 Reference Manual. Cambridge University Speech Group, Cambridge, U.K.Google Scholar

Copyright information

© Springer-Verlag London Limited 2008

Authors and Affiliations

  • Reinhold Haeb-Umbach
    • 1
  • Valentin Ion
    • 1
  1. 1.Department of Communications EngineeringUniversity of PaderbornPaderbornGermany

Personalised recommendations