Maximum Entropy Modeling: A Suitable Framework to Learn Context-Dependent Lexicon Models for Statistical Machine Translation

García-Varea, Ismael; Casacuberta, Francisco

doi:10.1007/s10994-005-0915-z

Maximum Entropy Modeling: A Suitable Framework to Learn Context-Dependent Lexicon Models for Statistical Machine Translation

Basic Instructions

Published: 02 June 2005

Volume 60, pages 135–158, (2005)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Maximum Entropy Modeling: A Suitable Framework to Learn Context-Dependent Lexicon Models for Statistical Machine Translation

Download PDF

Ismael García-Varea¹ &
Francisco Casacuberta²

569 Accesses
3 Citations
3 Altmetric
Explore all metrics

Abstract

Current statistical machine translation systems are mainly based on statistical word lexicons. However, these models are usually context-independent, therefore, the disambiguation of the translation of a source word must be carried out using other probabilistic distributions (distortion distributions and statistical language models). One efficient way to add contextual information to the statistical lexicons is based on maximum entropy modeling. In that framework, the context is introduced through feature functions that allow us to automatically learn context-dependent lexicon models.

In a first approach, maximum entropy modeling is carried out after a process of learning standard statistical models (alignment and lexicon). In a second approach, the maximum entropy modeling is integrated in the expectation-maximization process of learning standard statistical models.

Experimental results were obtained for two well-known tasks, the French–English Canadian Parliament Hansards task and the German–English Verbmobil task. These results proved that the use of maximum entropy models in both approaches, can help to improve the performance of the statistical translation systems.

Article PDF

Natural language processing: state of the art, current trends and challenges

Article 14 July 2022

Natural language syntax complies with the free-energy principle

Article Open access 03 May 2024

Natural Language Processing

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Abbreviations

ME::: Maximum Entropy
SMT::: Statistical Machine Translation
EM::: Expectation– Maximization.

References

Al-Onaizan, Y., Curin, J., Jahr, M., Knight, K., Lafferty, J. D., Melamed, I. D., Purdy, D., Och, F. J., Smith, N. A., & Yarowsky, D. (1999). Statistical machine translation, Final Report, JHU Workshop. http://www.clsp.jhu.edu/ws99/projects/mt/final_report/mt-final-report.ps.
Bender, O., Macherey, K., Och, F. J., & Ney, H. (2003). Comparison of alignment templates and maximum entropy models for natural Language understanding. In Proc. of the 10th Conference of the European Chapter of the Association for Computational Linguistics (pp. 11–18). Budapest, Hungary.
Berger, A. L., Brown, P. F., Della Pietra, S. A., Della Pietra, V. J., Gillett, J. R., Lafferty, J. D., Printz, H., & Ureš, L. (1994). The candide system for machine translation. In Proc. ARPA Workshop on Human Language Technology (pp. 157–162). Plainsboro, NJ.
Berger, A. L., Della Pietra, S. A., & Della Pietra, V. J. (1996). A maximum entropy approach to natural language processing. Computational Linguistics, 22:1, 39–72.
Google Scholar
Brown, P. F., Della Pietra, S. A., Della Pietra, V. J., & Mercer, R. L. (1993). The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19:2, 263–311.
Google Scholar
Charniak, E. (1999). A maximum-entropy-inspired parser. Technical ReportCS-99-12.
Darroch, J. N. & Ratcliff, D. (1972). Generalized iterative scaling for log-linear models. Annals of Mathematical Statistics, 43, 1470–1480.
Google Scholar
Della Pietra, S. A., Della Pietra, V. J., & Lafferty, J. D. (1997). Inducing features in random fields. IEEE Trans. on Pattern Analysis and Machine Inteligence, 19:4, 380–393.
Article Google Scholar
Foster, G. (2000a). Incorporating position information into a maximum entropy/minimum divergence translation model. In Fourth Conf. on Computational Language Learning (CoNLL) (pp. 37–52). Lisbon, Portugal.
Foster, G. (2000b). A maximum entropy/minimum divergence translation model. In Proc. of the 38th Annual Meeting of the Association for Computational Linguistics (ACL) (pp. 37–44). Hong Kong.
García-Varea, I., Och, F. J., Ney, H., & Casacuberta, F. (2001). Refined lexicon models for statistical machine translation using a maximum entropy approach. In Proc. of the 39th Annual Meeting of the Association for Computational Linguistics (ACL) (pp. 204–211). Toulouse, France.
García-Varea, I., Och, F. J., Ney, H., & Casacuberta, F. (2002). Efficient integration of maximum entropy lexicon models within the training of statistical alignment models. In C. Richardson (Ed.), Machine Translation: From research to real users (pp. 161–168). Lecture Notes in Artificial Intelligence. Springer-Verlag, The Association for Machine Translation in the Americas AMTA-2002 Conference. Tiburon, California.
Khudanpur, S., & Wu, J. (1999). A maximum entropy language modelto integrate N-grams and topic dependencies for conversational speech recognition. In Proc. Int. Conf. on Acoustics, Speech, and Signal Processing (pp. 553–556). Phoenix, USA
Khudanpur, S. & Wu, J. (2000). Maximum entropy techniques for exploiting syntactic, semantic and collocational dependencies in language modeling.. Computer, Speech and Language, 14, 355–372.
Google Scholar
Lauritzen, S. L. (1995). The EM algorithm for graphical association models with missing data.. Computational Statistics & Data Analysis, 19:2, 191–201.
Google Scholar
Martin, S., Ney, H., & Zaplo, J. (1999). Smoothing methods in maximum entropy language modeling. In Proc. Int. Conf. on Acoustics, Speech, and Signal Processing (pp. 545–548). Phoenix, AR.
Nießen, S., Vogel, S., Ney, H., & Tillmann, C. (1998). A DP-based search algorithm for statistical machine translation. In COLING-ACL ‘98: 30th Annual Meeting of the Association for Computational Linguistics and 17th Int. Conf. on Computational Linguistics (pp. 960-967). Montreal, Canada.
Och, F. J. (1999). An efficient method for determining bilingual word classes. In EACL ‘99: Ninth Conf. of the Europ. Chapter of the Association for Computational Linguistics (pp. 71–76). Bergen, Norway.
Och, F. J. (2000). GIZA++: Training of statistical translation models. http://www-i6.informatik.rwth-aachen.de/~och/software/GIZA++.html
Och, F. J. (2001). YASMET: Toolkit for conditional maximum entropy models. http://www-i6.informatik.rwth-aachen.de/~och/software/YASMET.html
Och, F. J., & Ney, H. (2000). A comparison of alignment models for statistical machine translation. In COLING ’00: The 18th Int. Conf. on Computational Linguistics (pp. 1086–1090). Saarbrücken, Germany.
Och, F. J., & Ney, H. (2002). Discriminative training and maximum entropy models for statistical machine translation. In Proc. of the 40th Annual Meeting of the Association for Computational Linguistics (ACL) Philadelphia, PA.
Och, F. J. & Ney, H. (2003). A systematic comparison of various statistical alignment models. Computational Linguistics, 29:1, 19–51.
Article Google Scholar
Papineni, K. A., Roukos, S., & Ward, R. T. (1998). Maximum likelihood and discriminative training of direct translation models. In Proc. Int. Conf. on Acoustics, Speech, and Signal Processing (pp. 189–192). Seattle, WA.
Peters, J., & Klakow, D. (1999). Compact maximum entropy language models, In Proc. of the IEEE Workshop on Automatic Speech Recognition and Understanding. Keystone, CO.
Ratnaparkhi, A. (1996). A maximum entropy model for part-of-speech tagging. In E. Brill, & K. Church (Eds.), Proceedings of the Conference on Empirical Methods in Natural Language Processing (pp. 133–142). Somerset, New Jersey: Association for Computational Linguistics.
Ratnaparkhi, A. (1997). A simple introduction to maximum entropy models for natural language processing
Ratnaparkhi, A. Learning to parse natural language with maximum entropy models. Machine Learning, 34, 151.
Riezler, S., Prescher, D., Khun, J., & Johnson, M. (2000). Lexicalized stochastic modeling of constraing-based grammars using log-Linear measures and EM-training. In Proc. of the 40th Annual Meeting of the Association for Computational Linguistics (ACL) Hong Kong.
Rosenfeld, R. (1996). A maximum entropy approach to adaptive statistical language modeling. Computer, Speech and Language, 10, 187–228.
Google Scholar
Tillmann, C., & Ney, H. (1996). Selection criteria for word trigger pairs in language modelling. In Grammatical Inference: Learning Syntax from Sentences, 3rd International Colloquium, ICGI-96, Montpellier, France, 1996, Proceedings (pp. 95–106). Vol. 1147. Berlin Springer.
Tillmann, C., & Ney, H. (1997). Word trigger and the EM algorithm. In Proc. Workshop Computational NaturalLanguage Learning (pp. 117–124). Madrid, Spain.
Tillmann, C., & Ney, H. (2000). Word re-ordering and DP-based search in statistical machine translation. In Procs. Workshop on Computational Natural Language Learning (CoNLL) (pp. 850–856). Saarbrücken, Germany.
Tillmann, C., Vogel, S., Ney, H., & Zubiaga, A. (1997). A DP-based search using monotone alignments in statistical translation. In Proc. 35th Annual Conf. of the Association for Computational Linguistics (pp. 289–296). Madrid, Spain.
Tomás, J., & Casacuberta, F. (2001). Monotone statistical translation using word groups. In Procs. of the Machine Translation Summit VIII (pp. 357–361). Santiago de Compostela, Spain.
Tomás, J., & Casacuberta, F. (2003). Combining phrase-based and template-based models in statistical machine translation. In F. Perales, A. Campillo, N. P. de la Blanca, & A. Sanfeliu (Eds.), Pattern recognition and image analisys (pp. 1021-1031). Vol. 2652 of Lecture Notes in Computer Science. Springer-Verlag, 1st Iberian Conference, IbPRIA-2003.
W. Wahlster (Ed.) Verbmobil: Foundations of speech-to-speech translations Berlin, Germany: Springer Verlag.
Wang, S., Schuurmans, D., & Zhao, Y. (2004). The latent maximum entropy principle. In submission.
Wang, Y.-Y., & Waibel, A. (1997). Decoding algorithm in statistical translation. In Proc. 35th Annual Conf. of the Association for Computational Linguistics (pp. 366–372). Madrid, Spain.
Zhou, G., & Lua, K. (1998). Word Association and MI-TRigger-based Language Modeling. In COLING-ACL ’98: 36th Annual Meeting of the Association for Computational Linguistics and 17th Int. Conf. on Computational Linguistics (pp. 1465–1471).

Download references

Author information

Authors and Affiliations

Dpto. de Informática, Univ. de Castilla-La Mancha, Campus Universitario s/n, 02071, Albacete, Spain
Ismael García-Varea
Dpto. de Sistemas Informáticos y Computación, Instituto Tecnológico de Informática, Univ. Politécnica de Valencia, Camino de Vera, s/n, 46071, Valencia, Spain
Francisco Casacuberta

Authors

Ismael García-Varea
View author publications
You can also search for this author in PubMed Google Scholar
Francisco Casacuberta
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ismael García-Varea.

Additional information

This work has been partially supported by the European Union under grant IST-2001-32091 and by the Spanish CICYT under project TIC-2003-08681-C02-02. The experiments on the Verbmobil task were done when the first author was a visiting scientist at RWTH Aachen-Germany.

Editors:

Dan Roth and Pascale Fung

Rights and permissions

Reprints and permissions

About this article

Cite this article

García-Varea, I., Casacuberta, F. Maximum Entropy Modeling: A Suitable Framework to Learn Context-Dependent Lexicon Models for Statistical Machine Translation. Mach Learn 60, 135–158 (2005). https://doi.org/10.1007/s10994-005-0915-z

Download citation

Received: 07 October 2003
Revised: 29 June 2004
Accepted: 10 November 2004
Published: 02 June 2005
Issue Date: September 2005
DOI: https://doi.org/10.1007/s10994-005-0915-z

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Maximum Entropy Modeling: A Suitable Framework to Learn Context-Dependent Lexicon Models for Statistical Machine Translation

Abstract

Article PDF

Similar content being viewed by others

Natural language processing: state of the art, current trends and challenges

Natural language syntax complies with the free-energy principle

Natural Language Processing

Abbreviations

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Editors:

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Maximum Entropy Modeling: A Suitable Framework to Learn Context-Dependent Lexicon Models for Statistical Machine Translation

Abstract

Article PDF

Similar content being viewed by others

Natural language processing: state of the art, current trends and challenges

Natural language syntax complies with the free-energy principle

Natural Language Processing

Abbreviations

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Editors:

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation