Hungarian-English Machine Translation Using GenPar

  • András Hócza
  • András Kocsor
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4188)


We present an approach for machine translation by applying the GenPar toolkit on POS-tagged and syntactically parsed texts. Our experiment in Hungarian-English machine translation is an attempt to develop prototypes of a syntax-driven machine translation system and to examine the effects of various preprocessing steps (POS-tagging, lemmatization and syntactic parsing) on system performance. The annotated monolingual texts needed for different language specific tasks were taken from the Szeged Treebank and the Penn Treebank. The parallel sentences were collected from the Hunglish Corpus. Each developed prototype runs fully automatically and new Hungarian-related functions are built in. The results are evaluated with BLEU score.


Machine Translation Test Sentence Statistical Machine Translation Sentence Pair Parallel Corpus 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bikel, D.: A distributional analysis of a lexicalized statistical parsing model. In: Proceedings of the 9th Conference on Empirical Methods in Natural Language Processing (EMNLP), Barcelona, Spain (2004)Google Scholar
  2. 2.
    Brown, Peter, F., Della Pietra, S.A., Pietra, V.D.J., Mercer, R.L.: The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics 19(2), 263–312 (1993)Google Scholar
  3. 3.
    Burbank, A., Carpuat, M., Clark, S., Dreyer, M., Fox, P., Groves, D., Hall, K., Hearne, M., Melamed, I.D., Shen, Y., Way, A., Wellington, B., Wu, D.: Final Report of the 2005 Language Engineering Workshop on Statistical Machine Translation by Parsing (November 2005)Google Scholar
  4. 4.
    Csendes, D., Csirik, J., Gyimóthy, T., Kocsor, A.: The Szeged Treebank. In: Matoušek, V., Mautner, P., Pavelka, T. (eds.) TSD 2005. LNCS (LNAI), vol. 3658, pp. 123–131. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  5. 5.
    Erjavec, T., Monachini, M. (eds.): Specification and Notation for Lexicon Encoding. Copernicus project 106 MULTEXT-EAST, Work Package WP1 - Task 1.1 Deliverable D1.1F (1997)Google Scholar
  6. 6.
    Hócza, A., Felföldi, L., Kocsor, A.: Learning Syntactic Patterns Using Boosting and Other Classifier Combination Schemas. In: Matoušek, V., Mautner, P., Pavelka, T. (eds.) TSD 2005. LNCS (LNAI), vol. 3658, pp. 69–76. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  7. 7.
    Kumar, S., Byrne, W.: A weighted finite-state transducer implementation of the alignment template model for statistical machine translation. In: Proceedings of the Human Language Technology Conference and the North American Association for Computational Linguistics (HLT-NAACL), Edmonton, Canada, pp. 63–70 (2003)Google Scholar
  8. 8.
    Lin, C.-Y., Och, F.J.: Automatic evaluation of machine translation quality using longest common subsequence and skip-bigram statistics. In: Proceedings of the 42nd Annual Meeting of the ACL, pp. 606–613 (2004)Google Scholar
  9. 9.
    Marcus, M., Santorini, B., Marcinkiewicz, M.: Building a large annotated corpus of English: the Penn Treebank. Computational Linguistics 19 (1993)Google Scholar
  10. 10.
    Melamed, I.D., Wei, W.: Statistical Machine Translation by Generalized Parsing. Technical Report 05-001, Proteus Project, New York University (2005)Google Scholar
  11. 11.
    Och, J.F., Ney, H.: Discriminative training and maximum entropy models for statistical machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia (2002)Google Scholar
  12. 12.
    Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the ACL, pp. 311–318 (2002)Google Scholar
  13. 13.
    Prószéky, G., Tihanyi, L.: MetaMorpho: A Pattern-Based Machine Translation Project. In: 24th Translating and the Computer Conference, London, United Kingdom, pp. 19–24 (2002)Google Scholar
  14. 14.
    Ratnaparkhi, A.: A linear observed time statistical parser based on maximum entropy models. In: Proceedings of the 2nd Conference on Empirical Methods in Natural Language Processing (EMNLP), Providence, Rhode Island (1997)Google Scholar
  15. 15.
    Tihanyi, L., Csendes, D., Merényi, C., Gyarmati, Á.: Technical report of NKFP-2/008/2004 (2005)Google Scholar
  16. 16.
    Varga, D., Németh, L., Halácsy, P., Kornai, A., Trón, V., Nagy, V.: Parallel corpora for medium density languages. In: Proceedings of the Recent Advances in Natural Language Processing 2005 Conference, pp. 590–596 (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • András Hócza
    • 1
  • András Kocsor
    • 1
  1. 1.Department of InformaticsUniversity of SzegedSzegedHungary

Personalised recommendations