In distributed and network speech recognition the actual recognition task is not carried out on the user’s terminal but rather on a remote server in the network. While there are good reasons for doing so, a disadvantage of this client-server architecture is clearly that the communication medium may introduce errors, which then impairs speech recognition accuracy. Even sophisticated channel coding cannot completely prevent the occurrence of residual bit errors in the case of temporarily adverse channel conditions, and in packet-oriented transmission packets of data may arrive too late for the given real-time constraints and have to be declared lost. The goal of error concealment is to reduce the detrimental effect that such errors may induce on the recipient of the transmitted speech signal by exploiting residual redundancy in the bit stream at the source coder output. In classical speech transmission a human is the recipient, and erroneous data are reconstructed so as to reduce the subjectively annoying effect of corrupted bits or lost packets. Here, however, a statistical classifier is at the receiving end, which can benefit from knowledge about the quality of the reconstruction. In this book chapter we show how the classical Bayesian decision rule needs to be modified to account for uncertain features, and illustrate how the required feature posterior density can be estimated in the case of distributed speech recognition. Some other techniques for error concealment can be related to this approach. Experimental results are given for both a small and a medium vocabulary recognition task and both for a channel exhibiting bit errors and a packet erasure channel.
Unable to display preview. Download preview PDF.
- Arrowood, J.A. and Clements, M.A. (2002). Using observation uncertainty in HMM decod-ing. In Proc. ICSLP, Denver, Colorado.Google Scholar
- Bernard, A. and Alwan, A. (2001). Joint channel decoding—Viterbi recognition for wireless applications. In Proc. Eurospeech, Aalborg, Denmark.Google Scholar
- Bernard, A. and Alwan, A. (2002). Low-bitrate distributed speech recognition for packet-based and wireless communication. IEEE Trans. Speech and Audio Process., vol. 10, no. 8, Nov., 2002.Google Scholar
- COST 207 (1989). Digital land mobile radio communication—Final report. Office for offi-cial publications of the European Communities, Luxembourg.Google Scholar
- Cox, R.V., Kleijn, W.B. and Kroon, P. (1989). Robust CELP coders for noisy backgrounds and noisy channels. In Proc. IEEE Int. Conf. Acoust. Speech Signal Process., 1989, pp. 739-742.Google Scholar
- Droppo, J., Acero, A. and Deng, L. (2002). Uncertainty decoding with Splice for noise robust speech recognition. In Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Orlando, Florida.Google Scholar
- Endo, T., Kuroiwa, S. and Nakamura, S. (2003). Missing feature theory applied to robust speech recognition over IP networks. In Proc. Eurospeech, Geneva, Switzerland. ETSI Standard ES 202 050 (2002). Speech processing, transmission and quality aspects (STQ); distributed speech recognition; advanced front-end feature extraction algorithm; compression algorithms. v1.1.1, Oct.Google Scholar
- ETSI Standard ES 201 108 (2003a). Speech processing, transmission and quality aspects (STQ);distributed speech recognition; front-end feature extraction algorithm; compression algorithms. v1.1.3, Sep.Google Scholar
- ETSI Standard TS 100 909 v8.7.1 (2003b). Digital cellular telecommunications system (phase 2+); channel coding. (3GPP TS 05.03 version 8.7.0; Release 1999).Google Scholar
- Fingscheidt, T., Aalburg, S., Stan, T. and Beaugeant, C. (2002). Network-based versus distrib-uted speech recognition in adaptive multi-rate wireless systems. In Proc. Int. Conf. on Spoken Language Proc., Denver.Google Scholar
- Fingscheidt, T. and Vary, P. (2001). Softbit speech decoding: A new approach to error con-cealment. IEEE Trans. Speech and Audio Proc., vol. 9, no. 3, March, pp. 1-11.Google Scholar
- GSM 06.11 Recommendation (1992). Substitution and muting of lost frames for full rate speech traffic channels. ETSI TC-SMG.Google Scholar
- Haeb-Umbach, R. and Ion, V. (2004). Soft features for improved distributed speech recogni-tion over wireless networks. In Proc. ICSLP, Jeju, Korea.Google Scholar
- Hirsch, H.G. and Pearce, D. (2000). The Aurora experimental framework for the performance evaluation ofspeech recognition systems undernoisy conditions. In Proc. ISCA ITRW Workshop ASR2000, Paris, France, pp. 181-188.Google Scholar
- Ion, V. and Haeb-Umbach, R. (2005). A unified probabilistic approach to error concealment for distributed speech recognition. In Proc. Interspeech, Lisbon.Google Scholar
- Ion, V. and Haeb-Umbach, R. (2006b). An inexpensive packet loss compensation scheme for distributed speech recognition based on soft-features. In Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Toulouse, France.Google Scholar
- Ion, V. and Haeb-Umbach, R. (2006c). Improved source modeling and predictive classifica-tion for channel robust speech recognition. In Proc. Interspeech, Pittsburgh. ITU-T Recommendation G.711 Appendix I (1999). A high quality low-complexity algorithm for packet loss concealment with G.711.Google Scholar
- James, A.B., Gomez, A. and Milner, B.P. (2004). A comparison of packet loss compensation methods and interleaving for speech recognition in burst-like packet loss. In Proc. ICSLP, Jeju, Korea.Google Scholar
- Kristjansson, T.T. and Frey, B.J. (2002). Accounting for uncertainty in observations: A new paradigm for robust speech recognition. In Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Orlando, Florida.Google Scholar
- Liao, H. and Gales, M.J.F. (2004). Uncertainty decoding for noise robust automatic speech recognition. Technical Report TR.499, Cambridge University Engineering Department.Google Scholar
- Milner, B. and Semnani, S. (2000). Robust speech recognition over IP networks. In Proc. Int. Conf. Acoust. Speech Signal Process., Istanbul, Turkey.Google Scholar
- Morris, A., Cooke, M. and Green, P. (1998). Some solutions to the missing feature problem in data classification, with application to noise-robust ASR. In Proc. Int. Conf. Acoust. Speech Signal Process., Seattle.Google Scholar
- Morris, A., Barker, J. and Bourlard, H. (2001). From missing data to maybe useful data: Soft data modeling for noise robust ASR. In Proc. WISP, vol. 6.Google Scholar
- Paul, D. and Baker, J. (1992). The design for the Wall Street Journal-based CSR corpus. DARPA Technical Report.Google Scholar
- Pearce, D. (2000). Enabling new speech driven services for mobile devices: An overview of the ETSI standards activities for distributed speech recognition front-ends. In Proc. Voice Input/Output Soc. Speech Applications Conference, May.Google Scholar
- Potamianos, A. and Weerackody, V. (2001). Soft-feature decoding for speech recognition over wireless channels. In Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Salt Lake City, Utah.Google Scholar
- RFC 2460 (1998). Internet Protocol, Version 6 (IPv6) Specification, http://www.ietf.org/rfc/ rfc2460.txt, Internet Engineering Task Force, Dec.
- RFC 3828 (2004). The Lightweight User Datagram Protocol (UDP-Lite), http://www.ietf.org/ rfc/rfc3828.txt, Internet Engineering Task Force, July.
- Tan, Z.-H., Dalsgaard, P. and Lindberg, B. (2004). A subvector-based error concealment algorithm for speech recognition over mobile networks. In Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Montreal, Quebec, Canada.Google Scholar
- Vary, P. and Martin, R. (2006). Digital Speech Transmission—Enhancement, Coding and Error Concealment. John Wiley, New York.Google Scholar
- Young, S.J. et al. (2004). HTK: Hidden Markov Model Toolkit V3.2.1 Reference Manual. Cambridge University Speech Group, Cambridge, U.K.Google Scholar