Skip to main content

Part of the book series: Advances in Pattern Recognition ((ACVPR))

Recent Advances in the processing capabilities of handheld devices (PDAs or mobile phones) have provided the opportunity for enablement of speech recognition system, and even end-to-end speech translation system on these devices. However, two-way free-form speech-to-speech translation (as opposite to fixed phrase translation) is a highly complex task. A large amount of computation is involved to achieve reliable transformation performance. Resource limitations are not just CPU speed, but also the memory and storage requirements, and the audio input and output requirements all tax current systems to their limits. When the resource demand exceeds the computational capability of available state-of-the-art hand-held devices, a common technique for mobile speech-to-speech translation system is to use a client-server approach, where the handheld device (a mobile phone or PDA) is treated simply as a system client. While we will briefly describe the client/server approach, we will mainly focus on the approach that the end-to-end speech-to-speech translation system is completely hosted on the handheld devices. We will describe the challenges and algorithm and code optimization solutions we developed for the handheld MASTOR systems (Multilingual Automatic Speech-to-Speech Translator) for between English and Mandarin Chinese, and between English and Arabic on embedded Linux and Windows CE operating systems. The system includes an HMM-based large vocabulary continuous speech recognizer using statistical n-grams, a translation module, and a multi-language speech synthesis system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Afify, M., Sarikaya, R., Kuo, J., Besacier, L., and Gao, Y. (2006). On the use of mor-phological analysis for dialectal Arabic speech recognition. In Proceedings of Inter-Speech.

    Google Scholar 

  • Balakrishnan, S.V. (2003). Fast incremental adaptation using maximum likelihood regres-sion and stochastic gradient descent. In Proceedings of EUROSPEECH.

    Google Scholar 

  • Bangalore, S., and Riccardi, G. (2001). A finite-state approach to machine translation. In Proceedings of North American Chapter of the Association for Computational Linguis-tics (NAACL).

    Google Scholar 

  • Brown, P., Della Pietra, S.A., Della Pietra, V.J., and Mercer, R.L. (1993). The Mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, vol. 19 (2), pp. 263-311.

    Google Scholar 

  • Donovan, R.E., and Eide, E.M. (1998). The IBM trainable speech synthesis system. In Proceedings of ICSLP, Sydney, Australia.

    Google Scholar 

  • Gao, Y., Zhou, B., Diao, Z., Sorensen, J., and Picheny, M. (2002). MARS: A statistical Semantic Parsing and Generation-Based Multilingual Automatic Translation System. Machine Translation, vol. 17, pp. 185-212.

    Article  Google Scholar 

  • Gao, Y., Zhou, B., Gu, L., Sarikaya, R., Kuo, H-K., Rosti, A-V.I., Afify, M., and Zhu, W. (2006). IBM MASTOR: Multilingual Automatic Speech-to-Speech Translator. In Pro-ceedings of ICASSP.

    Google Scholar 

  • Gu, L., Gao, Y., Liu, F., and Picheny, M. (2006). Concept-based speech-to-speech transla-tion using maximum entropy models for statistical natural concept generation. IEEE Transactions on Speech and Audio Processing, vol. 14, no. 2, pp. 377-392.

    Article  Google Scholar 

  • Knight, K., and Al-Onaizan, Y. (1998). Translation with finite-state devices. In Proceedings of 4th Conference of the Association for Machine Translation in the Americas, pp. 421-437.

    Google Scholar 

  • Koehn, P., Och, F., and Marcu, D. (2003). Statistical phrase-based translation. In Proceedings of North American Chapter of the Association for Computational Linguistics/Human Language Technologies.

    Google Scholar 

  • Kumar, S., Deng, Y., and Byrne, W. (2005). A weighted finite state transducer translation template model for statistical machine translation. Journal of Natural Language Engi-neering, vol. 11, no. 3.

    Google Scholar 

  • Lavie, A., Waibel, A., Levin, L., Finke, M., Gates, D., Gavalda, M., Zeppenfeld, T., and Zhan P. (1997). JANUS-III: Speech-to-Speech Translation in Multiple Languages. In Proceedings of ICASSP, Munich, Germany, vol. 1, pp. 99-102.

    Google Scholar 

  • Levin, L., Lavie, A., Woszczyna, M., Gates, D., Gavalda, M., Koll, D., and Waibel, A. (2000). The Janus-III Translation System: Speech-to-Speech Translation in Multiple Domains. Machine Translation, vol. 15, pp. 3-25.

    Article  MATH  Google Scholar 

  • Lazzari, G. (2000). Spoken Translation: Challenges and Opportunities. In Proceedings of ICSLP, Beijing.

    Google Scholar 

  • Li, Y., Erdogan, H., Gao, Y., and Marcheret, E. (2002). Incremental on-line feature space MLLR adaptation for telephony speech recognition. In Proceedings of ICSLP.

    Google Scholar 

  • Mohri, M., Pereira, F., and Riley, M. (2002). Weighted finite-state transducers in speech recognition. Computer Speech and Language, vol. 16, no. 1, pp. 69-88.

    Article  Google Scholar 

  • Ney, H., Niessen, S., Och, F.J., Sawaf, H., Tillmann, C., and Vogel, S. (2000). Algorithms for statistical translation for spoken language. IEEE Transactions on Speech and Audio Processing, vol. 8, no. 1, pp. 24-36.

    Article  Google Scholar 

  • Ney, H. (2003). The statistical approach to machine translation and a roadmap for speech translation. In Proceedings of Eurospeech.

    Google Scholar 

  • Povey, D., and Woodland, and P.C. (2002). Minimum phone error and I-smoothing for improved discriminative training. In Proceedings of ICASSP.

    Google Scholar 

  • Ratnaparkhi, A. (2002). Trainable method for surface natural language generation. In Pro-ceedings of 1st Meeting of North American Chapter of ACL.

    Google Scholar 

  • Tillmann, C., Vogel, S., Ney, H., and Sawaf, H. (2000). Statistical Translation of Text and Speech: First Results with the RWTH System. Machine Translation, vol. 15, pp. 43-74.

    Article  MATH  Google Scholar 

  • Wahlster, W. (ed.) (2000). Verbmobil: Foundations of Speech-to-Speech Translation. Springer, Berlin.

    MATH  Google Scholar 

  • Yamamoto, S. (2000). Toward speech communications beyond language barrier— Research of spoken language translation technologies at ATR. In Proceedings of ICSLP, Beijing.

    Google Scholar 

  • Zhou, B., Dechelotte, D., and Gao, Y. (2004). Two-way Speech-to-Speech Translation on Handheld Devices. In Proceedings of ICSLP.

    Google Scholar 

  • Zhou, B., Chen, S., and Gao, Y. (2005). Constrained phrase-based translation using weighted finite-state transducers. In Proceedings of ICASSP.

    Google Scholar 

  • Zhou, B., Chen, S., and Gao, Y. (2006). Folsom: A Fast and memory-efficient phrase-based approach to statistical machine translation. In Proceedings of IEEE/ACL 2006 Work-shop on Spoken Language Technology.

    Google Scholar 

  • Zhou, Y., Zong, C., and Xu, B. (2004). Bilingual chunk alignment in statistical machine translation. In Proceedings IEEE International Conference on Systems, Man and Cy-bernetics, vol. 2, pp. 1401-1406.

    Google Scholar 

  • Zhu, W., Zhou, B., Prosser, C., Krbec, P., and Gao, Y. (2006). Recent advances of IBM’s handheld speech translation system. In Proceedings of Interspeech.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag London Limited

About this chapter

Cite this chapter

Gao, Y., Zhou, B., Zhu, W., Zhang, W. (2008). Handheld Speech to Speech Translation System. In: Automatic Speech Recognition on Mobile Devices and over Communication Networks. Advances in Pattern Recognition. Springer, London. https://doi.org/10.1007/978-1-84800-143-5_15

Download citation

  • DOI: https://doi.org/10.1007/978-1-84800-143-5_15

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-84800-142-8

  • Online ISBN: 978-1-84800-143-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics