Hungarian-English Machine Translation Using GenPar

Hócza, András; Kocsor, András

doi:10.1007/11846406_11

András Hócza²¹ &
András Kocsor²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4188))

Included in the following conference series:

International Conference on Text, Speech and Dialogue

1045 Accesses

Abstract

We present an approach for machine translation by applying the GenPar toolkit on POS-tagged and syntactically parsed texts. Our experiment in Hungarian-English machine translation is an attempt to develop prototypes of a syntax-driven machine translation system and to examine the effects of various preprocessing steps (POS-tagging, lemmatization and syntactic parsing) on system performance. The annotated monolingual texts needed for different language specific tasks were taken from the Szeged Treebank and the Penn Treebank. The parallel sentences were collected from the Hunglish Corpus. Each developed prototype runs fully automatically and new Hungarian-related functions are built in. The results are evaluated with BLEU score.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bikel, D.: A distributional analysis of a lexicalized statistical parsing model. In: Proceedings of the 9th Conference on Empirical Methods in Natural Language Processing (EMNLP), Barcelona, Spain (2004)
Google Scholar
Brown, Peter, F., Della Pietra, S.A., Pietra, V.D.J., Mercer, R.L.: The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics 19(2), 263–312 (1993)
Google Scholar
Burbank, A., Carpuat, M., Clark, S., Dreyer, M., Fox, P., Groves, D., Hall, K., Hearne, M., Melamed, I.D., Shen, Y., Way, A., Wellington, B., Wu, D.: Final Report of the 2005 Language Engineering Workshop on Statistical Machine Translation by Parsing (November 2005)
Google Scholar
Csendes, D., Csirik, J., Gyimóthy, T., Kocsor, A.: The Szeged Treebank. In: Matoušek, V., Mautner, P., Pavelka, T. (eds.) TSD 2005. LNCS (LNAI), vol. 3658, pp. 123–131. Springer, Heidelberg (2005)
Chapter Google Scholar
Erjavec, T., Monachini, M. (eds.): Specification and Notation for Lexicon Encoding. Copernicus project 106 MULTEXT-EAST, Work Package WP1 - Task 1.1 Deliverable D1.1F (1997)
Google Scholar
Hócza, A., Felföldi, L., Kocsor, A.: Learning Syntactic Patterns Using Boosting and Other Classifier Combination Schemas. In: Matoušek, V., Mautner, P., Pavelka, T. (eds.) TSD 2005. LNCS (LNAI), vol. 3658, pp. 69–76. Springer, Heidelberg (2005)
Chapter Google Scholar
Kumar, S., Byrne, W.: A weighted finite-state transducer implementation of the alignment template model for statistical machine translation. In: Proceedings of the Human Language Technology Conference and the North American Association for Computational Linguistics (HLT-NAACL), Edmonton, Canada, pp. 63–70 (2003)
Google Scholar
Lin, C.-Y., Och, F.J.: Automatic evaluation of machine translation quality using longest common subsequence and skip-bigram statistics. In: Proceedings of the 42nd Annual Meeting of the ACL, pp. 606–613 (2004)
Google Scholar
Marcus, M., Santorini, B., Marcinkiewicz, M.: Building a large annotated corpus of English: the Penn Treebank. Computational Linguistics 19 (1993)
Google Scholar
Melamed, I.D., Wei, W.: Statistical Machine Translation by Generalized Parsing. Technical Report 05-001, Proteus Project, New York University (2005)
Google Scholar
Och, J.F., Ney, H.: Discriminative training and maximum entropy models for statistical machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia (2002)
Google Scholar
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the ACL, pp. 311–318 (2002)
Google Scholar
Prószéky, G., Tihanyi, L.: MetaMorpho: A Pattern-Based Machine Translation Project. In: 24th Translating and the Computer Conference, London, United Kingdom, pp. 19–24 (2002)
Google Scholar
Ratnaparkhi, A.: A linear observed time statistical parser based on maximum entropy models. In: Proceedings of the 2nd Conference on Empirical Methods in Natural Language Processing (EMNLP), Providence, Rhode Island (1997)
Google Scholar
Tihanyi, L., Csendes, D., Merényi, C., Gyarmati, Á.: Technical report of NKFP-2/008/2004 (2005)
Google Scholar
Varga, D., Németh, L., Halácsy, P., Kornai, A., Trón, V., Nagy, V.: Parallel corpora for medium density languages. In: Proceedings of the Recent Advances in Natural Language Processing 2005 Conference, pp. 590–596 (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Informatics, University of Szeged, H-6720, Szeged, Árpád tér 2., Hungary
András Hócza & András Kocsor

Authors

András Hócza
View author publications
You can also search for this author in PubMed Google Scholar
András Kocsor
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Informatics, Masaryk University, Brno, Czech Republic
Petr Sojka
Faculty of Informatics, Masaryk University, Botanická 68a, CZ-602 00, Brno, Czech Republic
Ivan Kopeček
Faculty of Informatics, Department of Computer Graphics and Design, Masaryk University, Botanická 68a, 60200, Brno, Czech Republic
Karel Pala

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hócza, A., Kocsor, A. (2006). Hungarian-English Machine Translation Using GenPar. In: Sojka, P., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2006. Lecture Notes in Computer Science(), vol 4188. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11846406_11

Download citation

DOI: https://doi.org/10.1007/11846406_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-39090-9
Online ISBN: 978-3-540-39091-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics