An English-Hindi Statistical Machine Translation System

  • Raghavendra Udupa U.
  • Tanveer A. Faruquie
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3248)


Recently statistical methods for natural language translation have become popular and found reasonable success. In this paper we describe an English-Hindi statistical machine translation system. Our machine translation system is based on IBM Models 1, 2, and 3. We present experimental results on an English-Hindi parallel corpus consisting of 150,000 sentence pairs. We propose two new algorithms for the transfer of fertility parameters from Model 2 to Model 3. Our algorithms have a worst case time complexity of O(m 3) improving on the exponential time algorithm proposed in the classical paper on IBM Models. When the maximum fertility of a word is small, our algorithms are O(m 2) and hence very efficient in practice.


Machine Translation Translation Model Statistical Machine Translation English Sentence Sentence Pair 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Brown, P.F., Della Pietra, S.A., Della Pietra, V.J., Mercer, R.L.: The mathematics of Statistical Machine Translation: Parameter estimation. Computational linguistics 19(2), 263–311 (1993)Google Scholar
  2. 2.
    Berger, A., Della Pietra, S., Della Pietra, V.: A maximum entropy approach to natural language processing. Computational linguistics 22(1) (1996)Google Scholar
  3. 3.
    Brown, P.F., Della Pietra, V.J., de Souza, P.V., Lai, J.C., Mercer, R.L.: Class based n-gram models for natural language. Computational linguistics 18(4) (1992)Google Scholar
  4. 4.
    Berger, A., Brown, P., Della Pietra, S., Della Pietra, V., Gillette, J., Laffert, J., Mercer, R., Printz, H., Ures, L.: The Candide system for machine translation. In: Proceedings of the ARPA Human Language Technology Workshop (1994)Google Scholar
  5. 5.
    Baum, L.E.: An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process. Inequalities 3, 1–8 (1972)Google Scholar
  6. 6.
    Knight, K.: Decoding complexity in word replacement translation models. Computational Linguistics 25(4) (1999)Google Scholar
  7. 7.
    Jelinek, F.: A fast sequential decoding algorithm using a stack. IBM Research Journal 13 (1969)Google Scholar
  8. 8.
    Brown, R.D.: Example-based Machine Translation in the Pangloss System. In: International Conference on Computational Linguistics (COLING 1996), Copenhagen, Denmark (August 1996)Google Scholar
  9. 9.
    Sinha, R.M.K., Sivaraman, K., Agrawal, A., Jain, R., Srivastava, R., Jain, A.: ANGLABHARTI: A Multilingual Machine Aided Translation Project on Trans lation from English to Hindi. In: IEEE International Conference on Systems, Man and Cybernetics, Vancouver, Canada (1995)Google Scholar
  10. 10.
    Tillman, C., Vogel, S., Ney, H., Zubiaga, A.: A DP-based search using monotone alignments in statistical translation. In: Proc. ACL (1997)Google Scholar
  11. 11.
    Tillman, C.: Word Re-Odering and Dynamic Programming based Search Algorithm for Statistical Machine Translation. Ph.D. Thesis (2001)Google Scholar
  12. 12.
    Papineni, K., Roukos, S., Ward, T., Zhu, W.-J.: Bleu: A Method for Automatic Evaluation of Machine Translation. IBM Research Report, RC22176, W0109-022 (2001)Google Scholar
  13. 13.
    Doddington, G.: Automatic Evaluation of Machine Translation Quality using Ngram Co-occurence Statistics. In: Human Language Technology: Notebook Proceedings, pp. 128–132 (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Raghavendra Udupa U.
    • 1
  • Tanveer A. Faruquie
    • 1
  1. 1.IBM India Research LabNew DelhiIndia

Personalised recommendations