Handheld Speech to Speech Translation System

Gao, Yuqing; Zhou, Bowen; Zhu, Weizhong; Zhang, Wei

doi:10.1007/978-1-84800-143-5_15

Yuqing Gao³,
Bowen Zhou³,
Weizhong Zhu³ &
…
Wei Zhang³

Part of the book series: Advances in Pattern Recognition ((ACVPR))

1189 Accesses
2 Citations

Recent Advances in the processing capabilities of handheld devices (PDAs or mobile phones) have provided the opportunity for enablement of speech recognition system, and even end-to-end speech translation system on these devices. However, two-way free-form speech-to-speech translation (as opposite to fixed phrase translation) is a highly complex task. A large amount of computation is involved to achieve reliable transformation performance. Resource limitations are not just CPU speed, but also the memory and storage requirements, and the audio input and output requirements all tax current systems to their limits. When the resource demand exceeds the computational capability of available state-of-the-art hand-held devices, a common technique for mobile speech-to-speech translation system is to use a client-server approach, where the handheld device (a mobile phone or PDA) is treated simply as a system client. While we will briefly describe the client/server approach, we will mainly focus on the approach that the end-to-end speech-to-speech translation system is completely hosted on the handheld devices. We will describe the challenges and algorithm and code optimization solutions we developed for the handheld MASTOR systems (Multilingual Automatic Speech-to-Speech Translator) for between English and Mandarin Chinese, and between English and Arabic on embedded Linux and Windows CE operating systems. The system includes an HMM-based large vocabulary continuous speech recognizer using statistical n-grams, a translation module, and a multi-language speech synthesis system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Afify, M., Sarikaya, R., Kuo, J., Besacier, L., and Gao, Y. (2006). On the use of mor-phological analysis for dialectal Arabic speech recognition. In Proceedings of Inter-Speech.
Google Scholar
Balakrishnan, S.V. (2003). Fast incremental adaptation using maximum likelihood regres-sion and stochastic gradient descent. In Proceedings of EUROSPEECH.
Google Scholar
Bangalore, S., and Riccardi, G. (2001). A finite-state approach to machine translation. In Proceedings of North American Chapter of the Association for Computational Linguis-tics (NAACL).
Google Scholar
Brown, P., Della Pietra, S.A., Della Pietra, V.J., and Mercer, R.L. (1993). The Mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, vol. 19 (2), pp. 263-311.
Google Scholar
Donovan, R.E., and Eide, E.M. (1998). The IBM trainable speech synthesis system. In Proceedings of ICSLP, Sydney, Australia.
Google Scholar
Gao, Y., Zhou, B., Diao, Z., Sorensen, J., and Picheny, M. (2002). MARS: A statistical Semantic Parsing and Generation-Based Multilingual Automatic Translation System. Machine Translation, vol. 17, pp. 185-212.
Article Google Scholar
Gao, Y., Zhou, B., Gu, L., Sarikaya, R., Kuo, H-K., Rosti, A-V.I., Afify, M., and Zhu, W. (2006). IBM MASTOR: Multilingual Automatic Speech-to-Speech Translator. In Pro-ceedings of ICASSP.
Google Scholar
Gu, L., Gao, Y., Liu, F., and Picheny, M. (2006). Concept-based speech-to-speech transla-tion using maximum entropy models for statistical natural concept generation. IEEE Transactions on Speech and Audio Processing, vol. 14, no. 2, pp. 377-392.
Article Google Scholar
Knight, K., and Al-Onaizan, Y. (1998). Translation with finite-state devices. In Proceedings of 4th Conference of the Association for Machine Translation in the Americas, pp. 421-437.
Google Scholar
Koehn, P., Och, F., and Marcu, D. (2003). Statistical phrase-based translation. In Proceedings of North American Chapter of the Association for Computational Linguistics/Human Language Technologies.
Google Scholar
Kumar, S., Deng, Y., and Byrne, W. (2005). A weighted finite state transducer translation template model for statistical machine translation. Journal of Natural Language Engi-neering, vol. 11, no. 3.
Google Scholar
Lavie, A., Waibel, A., Levin, L., Finke, M., Gates, D., Gavalda, M., Zeppenfeld, T., and Zhan P. (1997). JANUS-III: Speech-to-Speech Translation in Multiple Languages. In Proceedings of ICASSP, Munich, Germany, vol. 1, pp. 99-102.
Google Scholar
Levin, L., Lavie, A., Woszczyna, M., Gates, D., Gavalda, M., Koll, D., and Waibel, A. (2000). The Janus-III Translation System: Speech-to-Speech Translation in Multiple Domains. Machine Translation, vol. 15, pp. 3-25.
Article MATH Google Scholar
Lazzari, G. (2000). Spoken Translation: Challenges and Opportunities. In Proceedings of ICSLP, Beijing.
Google Scholar
Li, Y., Erdogan, H., Gao, Y., and Marcheret, E. (2002). Incremental on-line feature space MLLR adaptation for telephony speech recognition. In Proceedings of ICSLP.
Google Scholar
Mohri, M., Pereira, F., and Riley, M. (2002). Weighted finite-state transducers in speech recognition. Computer Speech and Language, vol. 16, no. 1, pp. 69-88.
Article Google Scholar
Ney, H., Niessen, S., Och, F.J., Sawaf, H., Tillmann, C., and Vogel, S. (2000). Algorithms for statistical translation for spoken language. IEEE Transactions on Speech and Audio Processing, vol. 8, no. 1, pp. 24-36.
Article Google Scholar
Ney, H. (2003). The statistical approach to machine translation and a roadmap for speech translation. In Proceedings of Eurospeech.
Google Scholar
Povey, D., and Woodland, and P.C. (2002). Minimum phone error and I-smoothing for improved discriminative training. In Proceedings of ICASSP.
Google Scholar
Ratnaparkhi, A. (2002). Trainable method for surface natural language generation. In Pro-ceedings of 1st Meeting of North American Chapter of ACL.
Google Scholar
Tillmann, C., Vogel, S., Ney, H., and Sawaf, H. (2000). Statistical Translation of Text and Speech: First Results with the RWTH System. Machine Translation, vol. 15, pp. 43-74.
Article MATH Google Scholar
Wahlster, W. (ed.) (2000). Verbmobil: Foundations of Speech-to-Speech Translation. Springer, Berlin.
MATH Google Scholar
Yamamoto, S. (2000). Toward speech communications beyond language barrier— Research of spoken language translation technologies at ATR. In Proceedings of ICSLP, Beijing.
Google Scholar
Zhou, B., Dechelotte, D., and Gao, Y. (2004). Two-way Speech-to-Speech Translation on Handheld Devices. In Proceedings of ICSLP.
Google Scholar
Zhou, B., Chen, S., and Gao, Y. (2005). Constrained phrase-based translation using weighted finite-state transducers. In Proceedings of ICASSP.
Google Scholar
Zhou, B., Chen, S., and Gao, Y. (2006). Folsom: A Fast and memory-efficient phrase-based approach to statistical machine translation. In Proceedings of IEEE/ACL 2006 Work-shop on Spoken Language Technology.
Google Scholar
Zhou, Y., Zong, C., and Xu, B. (2004). Bilingual chunk alignment in statistical machine translation. In Proceedings IEEE International Conference on Systems, Man and Cy-bernetics, vol. 2, pp. 1401-1406.
Google Scholar
Zhu, W., Zhou, B., Prosser, C., Krbec, P., and Gao, Y. (2006). Recent advances of IBM’s handheld speech translation system. In Proceedings of Interspeech.
Google Scholar

Download references

Author information

Authors and Affiliations

IBM T. J. Watson Research Center, USA
Yuqing Gao, Bowen Zhou, Weizhong Zhu & Wei Zhang

Authors

Yuqing Gao
View author publications
You can also search for this author in PubMed Google Scholar
Bowen Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Weizhong Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Wei Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Gao, Y., Zhou, B., Zhu, W., Zhang, W. (2008). Handheld Speech to Speech Translation System. In: Automatic Speech Recognition on Mobile Devices and over Communication Networks. Advances in Pattern Recognition. Springer, London. https://doi.org/10.1007/978-1-84800-143-5_15

Download citation

DOI: https://doi.org/10.1007/978-1-84800-143-5_15
Publisher Name: Springer, London
Print ISBN: 978-1-84800-142-8
Online ISBN: 978-1-84800-143-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics