Learning Finite State Transducers Using Bilingual Phrases

González, Jorge; Sanchis, Germán; Casacuberta, Francisco

doi:10.1007/978-3-540-78135-6_35

Jorge González¹,
Germán Sanchis¹ &
Francisco Casacuberta¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4919))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

1464 Accesses

Abstract

Statistical Machine Translation is receiving more and more attention every day due to the success that the phrase-based alignment models are obtaining. However, despite their power, state-of-the-art systems using these models present a series of disadvantages that lessen their effectiveness in working environments where temporal or spacial computational resources are limited. A finite-state framework represents an interesting alternative because it constitutes an efficient paradigm where quality and realtime factors are properly integrated in order to build translation devices that may be of help for their potential users. Here, we describe a way to use the bilingual information in a phrase-based model in order to implement a phrase-based ngram model using finite state transducers. It will be worth the trouble due to the notable decrease in computational requirements that finite state transducers present in practice with respect to the use of some well-known stack-decoding algorithms. Results for the French-English EuroParl benchmark corpus from the 2006 Workshop on Machine Translation of the ACL are reported.

This work has been partially supported by the EC (FEDER) and the Spanish projects TIN2006-15694-C02-01 and the Consolider Ingenio 2010 CSD2007-00018.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Brown, P.F., et al.: A statistical approach to machine translation. Computational Linguistics 16(2), 79–85 (1990)
Google Scholar
Brown, P.F., et al.: The Mathematics of Statistical Machine Translation: Parameter Estimation. Computational Linguistics 19(2), 263–311 (1993)
Google Scholar
Ney, H., et al.: Algorithms for statistical translation of spoken language. IEEE Transactions on Speech and Audio Processing 8(1), 24–36 (2000)
Article Google Scholar
Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Computational Linguistics 29(1), 19–51 (2003)
Article Google Scholar
Tomás, J., Casacuberta, F.: Monotone statistical translation using word groups. In: Proceedings of the Machine Translation Summit VIII, Santiago de Compostela, Spain, pp. 357–361 (2001)
Google Scholar
Zens, R., Och, F.J., Ney, H.: Phrase-based statistical machine translation. In: Jarke, M., Koehler, J., Lakemeyer, G. (eds.) KI 2002. LNCS (LNAI), vol. 2479, pp. 18–32. Springer, Heidelberg (2002)
Chapter Google Scholar
Casacuberta, F., et al.: Some approaches to statistical and finite-state speech-to-speech translation. Computer Speech and Language 18, 25–47 (1994)
Article Google Scholar
Casacuberta, F., Vidal, E.: Machine translation with inferred stochastic finite-state transducers. Computational Linguistics 30(2), 205–225 (2004)
Article MathSciNet Google Scholar
Casacuberta, F., Vidal, E., Picó, D.: Inference of finite-state transducers from regular languages. Pattern Recognition 38(9), 1431–1443 (2005)
Article MATH Google Scholar
Berger, A.L., et al.: Language Translation apparatus and method of using context-based translation models. United States Patent, No. 5510981 (1996)
Google Scholar
Ortiz, D., García-Varea, I., Casacuberta, F.: An empirical comparison of stack-based decoding algorithms for statistical machine translation. In: New Advance in Computer Vision. LNCS, Springer, Heidelberg (2003)
Google Scholar
Germann, U., et al.: Fast Decoding and Optimal Decoding for Machine Translation. In: ACL 2001, Toulouse, France, pp. 228–235 (2001)
Google Scholar
Llorens, D.: Suavizado de autómatas y traductores finitos estocásticos. Phd Thesis, Universidad Politécnica de Valencia (2000)
Google Scholar
Kumar, S., Deng, Y., Byrne, W.: A weighted finite state transducer translation template model for statistical machine translation. Natural Language Engineering 12(1), 35–75 (2006)
Article Google Scholar
Koehn, P., Monz, C.: Manual and Automatic Evaluation of Machine Translation between European Languages. In: NAACL 2006 Workshop on Statistical Machine Translation, pp. 102–121 (2006)
Google Scholar
Koehn, P., Och, F.J., Marcu, D.: Statistical Phrase-Based Translation’. In: NAACL/HLT 2003. In: Proceedings of the Human Language Technology and North American Association for Computational Linguistics Conference, May 27–June 1 2003, Edmonton, Canada,(2003)
Google Scholar
Koehn, P.: Europarl: A Parallel Corpus for Statistical Machine Translation. In: Proceedings of the 10th Machine Translation Summit, pp. 79–86 (2005)
Google Scholar
Papineni, A.K., et al.: Bleu: A method for automatic evaluation of machine translation. In Technical Report RC22176 (W0109-022), IBM Research Division, Thomas J. Watson Research Center, Yorktown Heights, NY (2001)
Google Scholar
Koehn, P., et al.: Moses: Open Source Toolkit for Statistical Machine Translation. Annual Meeting of the Association for Computational Linguistics (demonstration session) (2007)
Google Scholar
González, J., Casacuberta, F.: Phrase-based finite state models. In: Proceedings of the 6th International Workshop on Finite-State Methods and Natural Language Processing (2007)
Google Scholar
Casacuberta, F.: Inference of finite-state transducers by using regular grammars and morphisms. In: Oliveira, A.L. (ed.) ICGI 2000. LNCS (LNAI), vol. 1891, pp. 1–14. Springer, Heidelberg (2000)
Google Scholar
Mariño, J.B., et al.: N-gram-based Machine Translation. Computational Linguistics 32(4), 527–549 (2006)
Article MathSciNet Google Scholar
Ortiz, D., García-Varea, I., Casacuberta, F.: Thot: a Toolkit To Train Phrase-based Statistical Translation Models. In: Proceedings of the 10th Machine Translation Summit, pp. 141–148 (2005)
Google Scholar
Casacuberta, F., et al.: Human Interaction for high quality machine translation. In: Communications of the ACM (in press, 2007)
Google Scholar

Download references

Author information

Authors and Affiliations

Instituto Tecnológico de Informática, Universidad Politécnica de Valencia,
Jorge González, Germán Sanchis & Francisco Casacuberta

Authors

Jorge González
View author publications
You can also search for this author in PubMed Google Scholar
Germán Sanchis
View author publications
You can also search for this author in PubMed Google Scholar
Francisco Casacuberta
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

González, J., Sanchis, G., Casacuberta, F. (2008). Learning Finite State Transducers Using Bilingual Phrases. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2008. Lecture Notes in Computer Science, vol 4919. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78135-6_35

Download citation

DOI: https://doi.org/10.1007/978-3-540-78135-6_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78134-9
Online ISBN: 978-3-540-78135-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics