Recent Advances in the processing capabilities of handheld devices (PDAs or mobile phones) have provided the opportunity for enablement of speech recognition system, and even end-to-end speech translation system on these devices. However, two-way free-form speech-to-speech translation (as opposite to fixed phrase translation) is a highly complex task. A large amount of computation is involved to achieve reliable transformation performance. Resource limitations are not just CPU speed, but also the memory and storage requirements, and the audio input and output requirements all tax current systems to their limits. When the resource demand exceeds the computational capability of available state-of-the-art hand-held devices, a common technique for mobile speech-to-speech translation system is to use a client-server approach, where the handheld device (a mobile phone or PDA) is treated simply as a system client. While we will briefly describe the client/server approach, we will mainly focus on the approach that the end-to-end speech-to-speech translation system is completely hosted on the handheld devices. We will describe the challenges and algorithm and code optimization solutions we developed for the handheld MASTOR systems (Multilingual Automatic Speech-to-Speech Translator) for between English and Mandarin Chinese, and between English and Arabic on embedded Linux and Windows CE operating systems. The system includes an HMM-based large vocabulary continuous speech recognizer using statistical n-grams, a translation module, and a multi-language speech synthesis system.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Afify, M., Sarikaya, R., Kuo, J., Besacier, L., and Gao, Y. (2006). On the use of mor-phological analysis for dialectal Arabic speech recognition. In Proceedings of Inter-Speech.
Balakrishnan, S.V. (2003). Fast incremental adaptation using maximum likelihood regres-sion and stochastic gradient descent. In Proceedings of EUROSPEECH.
Bangalore, S., and Riccardi, G. (2001). A finite-state approach to machine translation. In Proceedings of North American Chapter of the Association for Computational Linguis-tics (NAACL).
Brown, P., Della Pietra, S.A., Della Pietra, V.J., and Mercer, R.L. (1993). The Mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, vol. 19 (2), pp. 263-311.
Donovan, R.E., and Eide, E.M. (1998). The IBM trainable speech synthesis system. In Proceedings of ICSLP, Sydney, Australia.
Gao, Y., Zhou, B., Diao, Z., Sorensen, J., and Picheny, M. (2002). MARS: A statistical Semantic Parsing and Generation-Based Multilingual Automatic Translation System. Machine Translation, vol. 17, pp. 185-212.
Gao, Y., Zhou, B., Gu, L., Sarikaya, R., Kuo, H-K., Rosti, A-V.I., Afify, M., and Zhu, W. (2006). IBM MASTOR: Multilingual Automatic Speech-to-Speech Translator. In Pro-ceedings of ICASSP.
Gu, L., Gao, Y., Liu, F., and Picheny, M. (2006). Concept-based speech-to-speech transla-tion using maximum entropy models for statistical natural concept generation. IEEE Transactions on Speech and Audio Processing, vol. 14, no. 2, pp. 377-392.
Knight, K., and Al-Onaizan, Y. (1998). Translation with finite-state devices. In Proceedings of 4th Conference of the Association for Machine Translation in the Americas, pp. 421-437.
Koehn, P., Och, F., and Marcu, D. (2003). Statistical phrase-based translation. In Proceedings of North American Chapter of the Association for Computational Linguistics/Human Language Technologies.
Kumar, S., Deng, Y., and Byrne, W. (2005). A weighted finite state transducer translation template model for statistical machine translation. Journal of Natural Language Engi-neering, vol. 11, no. 3.
Lavie, A., Waibel, A., Levin, L., Finke, M., Gates, D., Gavalda, M., Zeppenfeld, T., and Zhan P. (1997). JANUS-III: Speech-to-Speech Translation in Multiple Languages. In Proceedings of ICASSP, Munich, Germany, vol. 1, pp. 99-102.
Levin, L., Lavie, A., Woszczyna, M., Gates, D., Gavalda, M., Koll, D., and Waibel, A. (2000). The Janus-III Translation System: Speech-to-Speech Translation in Multiple Domains. Machine Translation, vol. 15, pp. 3-25.
Lazzari, G. (2000). Spoken Translation: Challenges and Opportunities. In Proceedings of ICSLP, Beijing.
Li, Y., Erdogan, H., Gao, Y., and Marcheret, E. (2002). Incremental on-line feature space MLLR adaptation for telephony speech recognition. In Proceedings of ICSLP.
Mohri, M., Pereira, F., and Riley, M. (2002). Weighted finite-state transducers in speech recognition. Computer Speech and Language, vol. 16, no. 1, pp. 69-88.
Ney, H., Niessen, S., Och, F.J., Sawaf, H., Tillmann, C., and Vogel, S. (2000). Algorithms for statistical translation for spoken language. IEEE Transactions on Speech and Audio Processing, vol. 8, no. 1, pp. 24-36.
Ney, H. (2003). The statistical approach to machine translation and a roadmap for speech translation. In Proceedings of Eurospeech.
Povey, D., and Woodland, and P.C. (2002). Minimum phone error and I-smoothing for improved discriminative training. In Proceedings of ICASSP.
Ratnaparkhi, A. (2002). Trainable method for surface natural language generation. In Pro-ceedings of 1st Meeting of North American Chapter of ACL.
Tillmann, C., Vogel, S., Ney, H., and Sawaf, H. (2000). Statistical Translation of Text and Speech: First Results with the RWTH System. Machine Translation, vol. 15, pp. 43-74.
Wahlster, W. (ed.) (2000). Verbmobil: Foundations of Speech-to-Speech Translation. Springer, Berlin.
Yamamoto, S. (2000). Toward speech communications beyond language barrier— Research of spoken language translation technologies at ATR. In Proceedings of ICSLP, Beijing.
Zhou, B., Dechelotte, D., and Gao, Y. (2004). Two-way Speech-to-Speech Translation on Handheld Devices. In Proceedings of ICSLP.
Zhou, B., Chen, S., and Gao, Y. (2005). Constrained phrase-based translation using weighted finite-state transducers. In Proceedings of ICASSP.
Zhou, B., Chen, S., and Gao, Y. (2006). Folsom: A Fast and memory-efficient phrase-based approach to statistical machine translation. In Proceedings of IEEE/ACL 2006 Work-shop on Spoken Language Technology.
Zhou, Y., Zong, C., and Xu, B. (2004). Bilingual chunk alignment in statistical machine translation. In Proceedings IEEE International Conference on Systems, Man and Cy-bernetics, vol. 2, pp. 1401-1406.
Zhu, W., Zhou, B., Prosser, C., Krbec, P., and Gao, Y. (2006). Recent advances of IBM’s handheld speech translation system. In Proceedings of Interspeech.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag London Limited
About this chapter
Cite this chapter
Gao, Y., Zhou, B., Zhu, W., Zhang, W. (2008). Handheld Speech to Speech Translation System. In: Automatic Speech Recognition on Mobile Devices and over Communication Networks. Advances in Pattern Recognition. Springer, London. https://doi.org/10.1007/978-1-84800-143-5_15
Download citation
DOI: https://doi.org/10.1007/978-1-84800-143-5_15
Publisher Name: Springer, London
Print ISBN: 978-1-84800-142-8
Online ISBN: 978-1-84800-143-5
eBook Packages: Computer ScienceComputer Science (R0)