Skip to main content

Hungarian-English Machine Translation Using GenPar

  • Conference paper
Text, Speech and Dialogue (TSD 2006)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4188))

Included in the following conference series:

  • 1045 Accesses

Abstract

We present an approach for machine translation by applying the GenPar toolkit on POS-tagged and syntactically parsed texts. Our experiment in Hungarian-English machine translation is an attempt to develop prototypes of a syntax-driven machine translation system and to examine the effects of various preprocessing steps (POS-tagging, lemmatization and syntactic parsing) on system performance. The annotated monolingual texts needed for different language specific tasks were taken from the Szeged Treebank and the Penn Treebank. The parallel sentences were collected from the Hunglish Corpus. Each developed prototype runs fully automatically and new Hungarian-related functions are built in. The results are evaluated with BLEU score.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bikel, D.: A distributional analysis of a lexicalized statistical parsing model. In: Proceedings of the 9th Conference on Empirical Methods in Natural Language Processing (EMNLP), Barcelona, Spain (2004)

    Google Scholar 

  2. Brown, Peter, F., Della Pietra, S.A., Pietra, V.D.J., Mercer, R.L.: The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics 19(2), 263–312 (1993)

    Google Scholar 

  3. Burbank, A., Carpuat, M., Clark, S., Dreyer, M., Fox, P., Groves, D., Hall, K., Hearne, M., Melamed, I.D., Shen, Y., Way, A., Wellington, B., Wu, D.: Final Report of the 2005 Language Engineering Workshop on Statistical Machine Translation by Parsing (November 2005)

    Google Scholar 

  4. Csendes, D., Csirik, J., Gyimóthy, T., Kocsor, A.: The Szeged Treebank. In: Matoušek, V., Mautner, P., Pavelka, T. (eds.) TSD 2005. LNCS (LNAI), vol. 3658, pp. 123–131. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  5. Erjavec, T., Monachini, M. (eds.): Specification and Notation for Lexicon Encoding. Copernicus project 106 MULTEXT-EAST, Work Package WP1 - Task 1.1 Deliverable D1.1F (1997)

    Google Scholar 

  6. Hócza, A., Felföldi, L., Kocsor, A.: Learning Syntactic Patterns Using Boosting and Other Classifier Combination Schemas. In: Matoušek, V., Mautner, P., Pavelka, T. (eds.) TSD 2005. LNCS (LNAI), vol. 3658, pp. 69–76. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  7. Kumar, S., Byrne, W.: A weighted finite-state transducer implementation of the alignment template model for statistical machine translation. In: Proceedings of the Human Language Technology Conference and the North American Association for Computational Linguistics (HLT-NAACL), Edmonton, Canada, pp. 63–70 (2003)

    Google Scholar 

  8. Lin, C.-Y., Och, F.J.: Automatic evaluation of machine translation quality using longest common subsequence and skip-bigram statistics. In: Proceedings of the 42nd Annual Meeting of the ACL, pp. 606–613 (2004)

    Google Scholar 

  9. Marcus, M., Santorini, B., Marcinkiewicz, M.: Building a large annotated corpus of English: the Penn Treebank. Computational Linguistics 19 (1993)

    Google Scholar 

  10. Melamed, I.D., Wei, W.: Statistical Machine Translation by Generalized Parsing. Technical Report 05-001, Proteus Project, New York University (2005)

    Google Scholar 

  11. Och, J.F., Ney, H.: Discriminative training and maximum entropy models for statistical machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia (2002)

    Google Scholar 

  12. Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the ACL, pp. 311–318 (2002)

    Google Scholar 

  13. Prószéky, G., Tihanyi, L.: MetaMorpho: A Pattern-Based Machine Translation Project. In: 24th Translating and the Computer Conference, London, United Kingdom, pp. 19–24 (2002)

    Google Scholar 

  14. Ratnaparkhi, A.: A linear observed time statistical parser based on maximum entropy models. In: Proceedings of the 2nd Conference on Empirical Methods in Natural Language Processing (EMNLP), Providence, Rhode Island (1997)

    Google Scholar 

  15. Tihanyi, L., Csendes, D., Merényi, C., Gyarmati, Á.: Technical report of NKFP-2/008/2004 (2005)

    Google Scholar 

  16. Varga, D., Németh, L., Halácsy, P., Kornai, A., Trón, V., Nagy, V.: Parallel corpora for medium density languages. In: Proceedings of the Recent Advances in Natural Language Processing 2005 Conference, pp. 590–596 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hócza, A., Kocsor, A. (2006). Hungarian-English Machine Translation Using GenPar. In: Sojka, P., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2006. Lecture Notes in Computer Science(), vol 4188. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11846406_11

Download citation

  • DOI: https://doi.org/10.1007/11846406_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-39090-9

  • Online ISBN: 978-3-540-39091-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics