Chapter

Automatic Speech Recognition on Mobile Devices and over Communication Networks

Part of the series Advances in Pattern Recognition pp 327-346

Handheld Speech to Speech Translation System

  • Yuqing GaoAffiliated withIBM T. J. Watson Research Center
  • , Bowen ZhouAffiliated withIBM T. J. Watson Research Center
  • , Weizhong ZhuAffiliated withIBM T. J. Watson Research Center
  • , Wei ZhangAffiliated withIBM T. J. Watson Research Center

* Final gross prices may vary according to local VAT.

Get Access

Recent Advances in the processing capabilities of handheld devices (PDAs or mobile phones) have provided the opportunity for enablement of speech recognition system, and even end-to-end speech translation system on these devices. However, two-way free-form speech-to-speech translation (as opposite to fixed phrase translation) is a highly complex task. A large amount of computation is involved to achieve reliable transformation performance. Resource limitations are not just CPU speed, but also the memory and storage requirements, and the audio input and output requirements all tax current systems to their limits. When the resource demand exceeds the computational capability of available state-of-the-art hand-held devices, a common technique for mobile speech-to-speech translation system is to use a client-server approach, where the handheld device (a mobile phone or PDA) is treated simply as a system client. While we will briefly describe the client/server approach, we will mainly focus on the approach that the end-to-end speech-to-speech translation system is completely hosted on the handheld devices. We will describe the challenges and algorithm and code optimization solutions we developed for the handheld MASTOR systems (Multilingual Automatic Speech-to-Speech Translator) for between English and Mandarin Chinese, and between English and Arabic on embedded Linux and Windows CE operating systems. The system includes an HMM-based large vocabulary continuous speech recognizer using statistical n-grams, a translation module, and a multi-language speech synthesis system.