Abstract
Current statistical machine translation systems are mainly based on statistical word lexicons. However, these models are usually context-independent, therefore, the disambiguation of the translation of a source word must be carried out using other probabilistic distributions (distortion distributions and statistical language models). One efficient way to add contextual information to the statistical lexicons is based on maximum entropy modeling. In that framework, the context is introduced through feature functions that allow us to automatically learn context-dependent lexicon models.
In a first approach, maximum entropy modeling is carried out after a process of learning standard statistical models (alignment and lexicon). In a second approach, the maximum entropy modeling is integrated in the expectation-maximization process of learning standard statistical models.
Experimental results were obtained for two well-known tasks, the French–English Canadian Parliament Hansards task and the German–English Verbmobil task. These results proved that the use of maximum entropy models in both approaches, can help to improve the performance of the statistical translation systems.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Abbreviations
- ME::
-
Maximum Entropy
- SMT::
-
Statistical Machine Translation
- EM::
-
Expectation– Maximization.
References
Al-Onaizan, Y., Curin, J., Jahr, M., Knight, K., Lafferty, J. D., Melamed, I. D., Purdy, D., Och, F. J., Smith, N. A., & Yarowsky, D. (1999). Statistical machine translation, Final Report, JHU Workshop. http://www.clsp.jhu.edu/ws99/projects/mt/final_report/mt-final-report.ps.
Bender, O., Macherey, K., Och, F. J., & Ney, H. (2003). Comparison of alignment templates and maximum entropy models for natural Language understanding. In Proc. of the 10th Conference of the European Chapter of the Association for Computational Linguistics (pp. 11–18). Budapest, Hungary.
Berger, A. L., Brown, P. F., Della Pietra, S. A., Della Pietra, V. J., Gillett, J. R., Lafferty, J. D., Printz, H., & Ureš, L. (1994). The candide system for machine translation. In Proc. ARPA Workshop on Human Language Technology (pp. 157–162). Plainsboro, NJ.
Berger, A. L., Della Pietra, S. A., & Della Pietra, V. J. (1996). A maximum entropy approach to natural language processing. Computational Linguistics, 22:1, 39–72.
Brown, P. F., Della Pietra, S. A., Della Pietra, V. J., & Mercer, R. L. (1993). The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19:2, 263–311.
Charniak, E. (1999). A maximum-entropy-inspired parser. Technical ReportCS-99-12.
Darroch, J. N. & Ratcliff, D. (1972). Generalized iterative scaling for log-linear models. Annals of Mathematical Statistics, 43, 1470–1480.
Della Pietra, S. A., Della Pietra, V. J., & Lafferty, J. D. (1997). Inducing features in random fields. IEEE Trans. on Pattern Analysis and Machine Inteligence, 19:4, 380–393.
Foster, G. (2000a). Incorporating position information into a maximum entropy/minimum divergence translation model. In Fourth Conf. on Computational Language Learning (CoNLL) (pp. 37–52). Lisbon, Portugal.
Foster, G. (2000b). A maximum entropy/minimum divergence translation model. In Proc. of the 38th Annual Meeting of the Association for Computational Linguistics (ACL) (pp. 37–44). Hong Kong.
García-Varea, I., Och, F. J., Ney, H., & Casacuberta, F. (2001). Refined lexicon models for statistical machine translation using a maximum entropy approach. In Proc. of the 39th Annual Meeting of the Association for Computational Linguistics (ACL) (pp. 204–211). Toulouse, France.
García-Varea, I., Och, F. J., Ney, H., & Casacuberta, F. (2002). Efficient integration of maximum entropy lexicon models within the training of statistical alignment models. In C. Richardson (Ed.), Machine Translation: From research to real users (pp. 161–168). Lecture Notes in Artificial Intelligence. Springer-Verlag, The Association for Machine Translation in the Americas AMTA-2002 Conference. Tiburon, California.
Khudanpur, S., & Wu, J. (1999). A maximum entropy language modelto integrate N-grams and topic dependencies for conversational speech recognition. In Proc. Int. Conf. on Acoustics, Speech, and Signal Processing (pp. 553–556). Phoenix, USA
Khudanpur, S. & Wu, J. (2000). Maximum entropy techniques for exploiting syntactic, semantic and collocational dependencies in language modeling.. Computer, Speech and Language, 14, 355–372.
Lauritzen, S. L. (1995). The EM algorithm for graphical association models with missing data.. Computational Statistics & Data Analysis, 19:2, 191–201.
Martin, S., Ney, H., & Zaplo, J. (1999). Smoothing methods in maximum entropy language modeling. In Proc. Int. Conf. on Acoustics, Speech, and Signal Processing (pp. 545–548). Phoenix, AR.
Nießen, S., Vogel, S., Ney, H., & Tillmann, C. (1998). A DP-based search algorithm for statistical machine translation. In COLING-ACL ‘98: 30th Annual Meeting of the Association for Computational Linguistics and 17th Int. Conf. on Computational Linguistics (pp. 960-967). Montreal, Canada.
Och, F. J. (1999). An efficient method for determining bilingual word classes. In EACL ‘99: Ninth Conf. of the Europ. Chapter of the Association for Computational Linguistics (pp. 71–76). Bergen, Norway.
Och, F. J. (2000). GIZA++: Training of statistical translation models. http://www-i6.informatik.rwth-aachen.de/~och/software/GIZA++.html
Och, F. J. (2001). YASMET: Toolkit for conditional maximum entropy models. http://www-i6.informatik.rwth-aachen.de/~och/software/YASMET.html
Och, F. J., & Ney, H. (2000). A comparison of alignment models for statistical machine translation. In COLING ’00: The 18th Int. Conf. on Computational Linguistics (pp. 1086–1090). Saarbrücken, Germany.
Och, F. J., & Ney, H. (2002). Discriminative training and maximum entropy models for statistical machine translation. In Proc. of the 40th Annual Meeting of the Association for Computational Linguistics (ACL) Philadelphia, PA.
Och, F. J. & Ney, H. (2003). A systematic comparison of various statistical alignment models. Computational Linguistics, 29:1, 19–51.
Papineni, K. A., Roukos, S., & Ward, R. T. (1998). Maximum likelihood and discriminative training of direct translation models. In Proc. Int. Conf. on Acoustics, Speech, and Signal Processing (pp. 189–192). Seattle, WA.
Peters, J., & Klakow, D. (1999). Compact maximum entropy language models, In Proc. of the IEEE Workshop on Automatic Speech Recognition and Understanding. Keystone, CO.
Ratnaparkhi, A. (1996). A maximum entropy model for part-of-speech tagging. In E. Brill, & K. Church (Eds.), Proceedings of the Conference on Empirical Methods in Natural Language Processing (pp. 133–142). Somerset, New Jersey: Association for Computational Linguistics.
Ratnaparkhi, A. (1997). A simple introduction to maximum entropy models for natural language processing
Ratnaparkhi, A. Learning to parse natural language with maximum entropy models. Machine Learning, 34, 151.
Riezler, S., Prescher, D., Khun, J., & Johnson, M. (2000). Lexicalized stochastic modeling of constraing-based grammars using log-Linear measures and EM-training. In Proc. of the 40th Annual Meeting of the Association for Computational Linguistics (ACL) Hong Kong.
Rosenfeld, R. (1996). A maximum entropy approach to adaptive statistical language modeling. Computer, Speech and Language, 10, 187–228.
Tillmann, C., & Ney, H. (1996). Selection criteria for word trigger pairs in language modelling. In Grammatical Inference: Learning Syntax from Sentences, 3rd International Colloquium, ICGI-96, Montpellier, France, 1996, Proceedings (pp. 95–106). Vol. 1147. Berlin Springer.
Tillmann, C., & Ney, H. (1997). Word trigger and the EM algorithm. In Proc. Workshop Computational NaturalLanguage Learning (pp. 117–124). Madrid, Spain.
Tillmann, C., & Ney, H. (2000). Word re-ordering and DP-based search in statistical machine translation. In Procs. Workshop on Computational Natural Language Learning (CoNLL) (pp. 850–856). Saarbrücken, Germany.
Tillmann, C., Vogel, S., Ney, H., & Zubiaga, A. (1997). A DP-based search using monotone alignments in statistical translation. In Proc. 35th Annual Conf. of the Association for Computational Linguistics (pp. 289–296). Madrid, Spain.
Tomás, J., & Casacuberta, F. (2001). Monotone statistical translation using word groups. In Procs. of the Machine Translation Summit VIII (pp. 357–361). Santiago de Compostela, Spain.
Tomás, J., & Casacuberta, F. (2003). Combining phrase-based and template-based models in statistical machine translation. In F. Perales, A. Campillo, N. P. de la Blanca, & A. Sanfeliu (Eds.), Pattern recognition and image analisys (pp. 1021-1031). Vol. 2652 of Lecture Notes in Computer Science. Springer-Verlag, 1st Iberian Conference, IbPRIA-2003.
W. Wahlster (Ed.) Verbmobil: Foundations of speech-to-speech translations Berlin, Germany: Springer Verlag.
Wang, S., Schuurmans, D., & Zhao, Y. (2004). The latent maximum entropy principle. In submission.
Wang, Y.-Y., & Waibel, A. (1997). Decoding algorithm in statistical translation. In Proc. 35th Annual Conf. of the Association for Computational Linguistics (pp. 366–372). Madrid, Spain.
Zhou, G., & Lua, K. (1998). Word Association and MI-TRigger-based Language Modeling. In COLING-ACL ’98: 36th Annual Meeting of the Association for Computational Linguistics and 17th Int. Conf. on Computational Linguistics (pp. 1465–1471).
Author information
Authors and Affiliations
Corresponding author
Additional information
This work has been partially supported by the European Union under grant IST-2001-32091 and by the Spanish CICYT under project TIC-2003-08681-C02-02. The experiments on the Verbmobil task were done when the first author was a visiting scientist at RWTH Aachen-Germany.
Editors:
Dan Roth and Pascale Fung
Rights and permissions
About this article
Cite this article
García-Varea, I., Casacuberta, F. Maximum Entropy Modeling: A Suitable Framework to Learn Context-Dependent Lexicon Models for Statistical Machine Translation. Mach Learn 60, 135–158 (2005). https://doi.org/10.1007/s10994-005-0915-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-005-0915-z