Automatic Speech Recognition on Mobile Devices and over Communication Networks

Part of the series Advances in Pattern Recognition pp 27-39

Speech Coding and Packet Loss Effects on Speech and Speaker Recognition

  • Laurent BesacierAffiliated withLIG Laboratory, University J. Fourier

* Final gross prices may vary according to local VAT.

Get Access

This chapter is related to the speech coding and packet loss problems that occur in network speech recognition where speech is transmitted (and most of the time coded) from a client terminal to a recognition server. The first part describes some commonly used speech coding standards and presents a packet loss model useful to evaluate different channel degradation conditions in a controlled fashion. The second part evaluates the influence of different speech and audio codecs on the performance of a continuous speech recognition engine. It is shown that MPEG transcoding degrades the speech recognition performance for low bit rates whereas performance remains acceptable for specialized speech coders like G723. The same system is also evaluated for different simulated and real packet loss conditions; in that case, the significant degradation of the automatic speech recognition (ASR) performance is analyzed. The third part presents an overview of joint compression and packet loss effects on speech biometrics. Conversely to the ASR task, it is experimentally demonstrated that the adverse effects of packet loss alone are negligible, while the encoding of speech, particularly at a low bit rate, coupled with packet loss, can reduce the speaker recognition accuracy considerably. The fourth part discusses these experimental observations and refers to robustness approaches.