Soft syntactic constraints for Arabic–English hierarchical phrase-based translation

Marton, Yuval; Chiang, David; Resnik, Philip

doi:10.1007/s10590-011-9111-z

Soft syntactic constraints for Arabic–English hierarchical phrase-based translation

Published: 26 October 2011

Volume 26, pages 137–157, (2012)
Cite this article

Machine Translation

Yuval Marton¹,
David Chiang² &
Philip Resnik³

280 Accesses
6 Citations
Explore all metrics

Abstract

In adding syntax to statistical machine translation, there is a tradeoff between taking advantage of linguistic analysis and allowing the model to exploit parallel training data with no linguistic analysis: translation quality versus coverage. A number of previous efforts have tackled this tradeoff by starting with a commitment to linguistically motivated analyses and then finding appropriate ways to soften that commitment. We present an approach that explores the tradeoff from the other direction, starting with a translation model learned directly from aligned parallel text, and then adding soft constituent-level constraints based on parses of the source language. We argue that in order for these constraints to improve translation, they must be fine-grained: the constraints should vary by constituent type, and by the type of match or mismatch with the parse. We also use a different feature weight optimization technique, capable of handling large amount of features, thus eliminating the bottleneck of feature selection. We obtain substantial improvements in performance for translation from Arabic to English.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

English-Arabic Statistical Machine Translation: State of the Art

A Novel Approach by Injecting CCG Supertags into an Arabic–English Factored Translation Machine

Article 12 March 2016

Matching Phrases for Arabic-to-English Example-Based Translation System

References

Bach N, Vogel S, Cherry C (2009) Cohesive constraints in a beam search phrase-based decoder. In: Proceedings of the North American chapter of the Association for Computational Linguistics (NAACL-HLT), Short Papers, pp 1–4
Birch A, Osborne M, Koehn P (2007) CCG supertags in factored statistical machine translation. In: Proceedings of the ACL Workshop on Statistical Machine Translation
Brown PF, Cocke J, Pietra SD, Pietra VJD, Jelinek F, Lafferty JD, Mercer RL, Roossin PS (1990) A statistical approach to machine translation. Comput Linguist 16(2): 79–85
Google Scholar
Brown PF, Pietra SAD, Pietra VJD, Mercer RL (1993) The mathematics of statistical machine translation. Comput Linguist 19(2): 263–313
Google Scholar
Carpuat M, Marton Y, Habash N (2010) Explorations in subject-verb reordering for Arabic–English statistical machine translation. In: Proceedings of the 48th Annual Conference of the Association for Computational Linguistics (ACL)
Chen SF, Goodman J (1998) An empirical study of smoothing techniques for language modeling. Technical Report TR-10-98, Computer Science Group, Harvard University
Cherry C (2008) Cohesive phrase-based decoding for statistical machine translation. In: Proceedings of the 46th Annual Conference of the Association for Computational Linguistics: Human Language Technology (ACL-HLT), pp 72–80
Chiang D (2005) A hierarchical phrase-based model for statistical machine translation. In: Proceedings of the 43rd Annual Conference of the Association for Computational Linguistics (ACL), pp 263–270
Chiang D (2007) Hierarchical phrase-based translation. Comput Linguist 33(2): 201–228
Article Google Scholar
Chiang D (2010) Learning to translate with source and target syntax. In: Proceedings of the 48th Annual Conference of the Association for Computational Linguistics (ACL), pp 1443–1452
Chiang D, Lopez A, Madnani N, Monz C, Resnik P, Subotin M (2005) The Hiero machine translation system: extensions, evaluation, and analysis. In: Proceedings of the Conference on Empirical Methods in Natural Language (EMNLP)–Human Language Technology (HLT), pp 779–786
Chiang D, Marton Y, Resnik P (2008) Online large-margin training of syntactic and structural translation features. In: Procedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP)
Chiang D, Knight K, Wang W (2009) 11,001 new features for statistical machine translation. In: Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT), pp 218–226
Chiang D, DeNeefe S, Pust M (2011) Two easy improvements to lexical weighting. In: Proceedings of the 49th Annual Conference of the Association for Computational Linguistics (ACL), poster session
Cowan B, Kucerova I, Collins M (2006) A discriminative model for tree-to-tree translation. In: Proceedings of the Conference on Empirical Methods in Natural Language (EMNLP)
Crammer K, Singer Y (2003) Ultraconservative online algorithms for multiclass problems. J Mach Learn Res 3: 951–991
MathSciNet MATH Google Scholar
DeNeefe S, Knight K (2009) Synchronous tree adjoining machine translation. In: Proceedings of the 2009 Annual Meeting of the Association for Computational Linguistics (ACL)
DeNeefe S, Knight K, Wang W, Marcu D (2007) What can syntax-based MT learn from phrase-based MT? In: Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP/CoNLL)
Diab M, Hacioglu K, Jurafsky D (2004) Automatic tagging of Arabic text: From raw text to base phrase chunks. In: Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT), pp 149–152, companion volume
Dreyer M, Hall K, Khudanpur S (2007) Comparing reordering constraints for SMT using efficient Bleu oracle computation. In: Proc. 2007 Workshop on Syntax and Structure in Statistical Translation
Eisner J (2003) Learning non-isomorphic tree mappings for machine translation. In: Proceedings of the the Annual Meeting of the Association for Computational Linguistics (ACL) Companion Volume
Fox H (2002) Phrasal cohesion and statistical machine translation. In: Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Galley M, Graehl J, Knight K, Marcu D, DeNeefe S, Wang W, Thayer I (2006) Scalable inference and training of context-rich syntactic translation models. In: Proceedings of the Joint Conference of the International Committee on Computational Linguistics and the Association for Computational Linguistics
Green S, Sathi C, Manning CD (2009) NP subject detection in verb-initial Arabic clauses. In: Proceedings of the Third Workshop on Computational Approaches to Arabic Script-based Languages (CAASL3), Machine Translation Summit XII
Guthrie D, Hepple M, Liu W (2010) Efficient minimal perfect hash language models. In: Proceedings of the Seventh Conference on International Language Resources and Evaluation (LREC), Valletta, Malta, pp 2889–2896
Hanneman G, Lavie A (2009) Decoding with syntactic and non-syntactic phrases in a syntax-based machine translation system. In: Proceedings of the Third Workshop on Syntax and Structure in Statistical Translation
Hassan H, Sima’an K, Way A (2007) Integrating supertags into phrase-based statistical machine translation. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL), pp 288–295
Hassan H, Sima’an K, Way A (2009) A syntactified direct translation model with linear-time decoding. In: Proceedings of the Conference on Empirical Methods in Natural Language (EMNLP), vol 3, pp 1182–1191
Klein D, Manning CD (2003a) Accurate unlexicalized parsing. In: Proceedings of the 41st Meeting of the Association for Computational Linguistics (ACL), pp 423–430
Klein D, Manning CD (2003b) Fast exact inference with a factored model for natural language parsing. Adv Neural Inf Process Syst (NIPS) 15: 3–10
Google Scholar
Koehn P (2003) Noun phrase translation. PhD thesis, University of Southern California
Koehn P (2004) Statistical significance tests for machine translation evaluation. In: Proceedings of the Conference on Empirical Methods in Natural Language (EMNLP)
Koehn P (2010) Statistical machine translation. Cambridge University Press, Cambridge
MATH Google Scholar
Koehn P, Hoang H (2007) Factored translation models. In: Proceedings of Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP/CoNLL), pp 868–876
Koehn P, Och FJ, Marcu D (2003) Statistical phrase-based translation. In: Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT), pp 127–133
Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Zens CMR, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: Open source toolkit for statistical machine translation. In: Annual Meeting of the Association for Computational Linguistics (ACL), demonstration session
Lopez A (2008) Statistical machine translation. ACM Comput Surv 40(3): 1–49
Article Google Scholar
Marcu D, Wang W, Echihabi A, Knight K (2006) SPMT: Statistical machine translation with syntactified target language phrases. In: Proceedings of the Conference on Empirical Methods in Natural Language (EMNLP), pp 44–52
Marton Y (2009) Fine-grained linguistic soft constraints on statistical natural language processing models. Doctoral dissertation, University of Maryland, College Park
Marton Y, Resnik P (2008) Soft syntactic constraints for hierarchical phrase-based translation. In: Proceedings of the 2008 Annual Meeting of the Association for Computational Linguistics (ACL-HLT), pp 1003–1011
Mi H, Huang L, Liu Q (2008) Forest-based translation. In: Proceedings of the 2008 Annual Meeting of the Association for Computational Linguistics (ACL-HLT), pp 192–199
Och FJ (2003) Minimum error rate training in statistical machine translation. In: Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (ACL), pp 160–167
Och FJ, Ney H (2000) Improved statistical alignment models. In: Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics (ACL), pp 440–447
Papineni K, Roukos S, Ward T, Henderson J, Reeder F (2002) Corpus-based comprehensive and diagnostic MT evaluation: Initial Arabic, Chinese, French, and Spanish results. In: Proceedings of the 2002 Annual Meeting of the Association for Computational Linguistics (ACL-HLT), pp 124–127
Quirk C, Menezes A (2006) Dependency treelet translation: the convergence of statistical and example-based machine translation?. Mach Transl 20: 43–65
Article Google Scholar
Quirk C, Menezes A, Cherry C (2005) Dependency treelet translation: Syntactically informed phrasal SMT. In: Proceedings of the 2005 Annual Meeting of the Association for Computational Linguistics (ACL)
Riezler S, Maxwell J (2006) Grammatical machine translation. In: Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT)
Stolcke A (2002) SRILM—an extensible language modeling toolkit. In: Proceedings of the International Conference on Spoken Language Processing, vol 2, pp 901–904
Venugopal A, Zollmann A, Smith N, Vogel S (2009) Preference grammars: Softening syntactic constraints to improve statistical machine translation. In: Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL)
Wang W, Knight K, Marcu D (2007) Binarizing syntax trees to improve syntax-based machine translation accuracy. In: Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP/CoNLL)
Watanabe T, Suzuki J, Tsukuda H, Isozaki H (2007) Online large-margin training for statistical machine translation. In: Proceedings of the Conference on Empirical Methods in Natural Language (EMNLP)
Wu D (1997) Stochastic inversion transduction grammars and bilingual parsing of parallel corpora. Comput Linguist 23: 377–404
Google Scholar
Xiong D, Zhang M, Aw A, Li H (2009) A syntax-driven bracketing model for phrase-based translation. In: Proceedings of the 47th Annual Conference of the Association for Computational Linguistics (ACL)
Zhang M, Jiang H, Aw A, Li H, Tan CL, Li S (2008) A tree sequence alignment-based tree-to-tree translation model. In: Proceedings of the 2008 Annual Meeting of the Association for Computational Linguistics (ACL-HLT), pp 559–567
Zollmann A, Venugopal A (2006) Syntax augmented machine translation via chart parsing. In: Proceedings of the SMT Workshop at the North American Chapter of the Association for Computational Linguistics (NAACL-HLT)

Download references

Author information

Authors and Affiliations

IBM T.J. Watson Research Center, 1101 Kitchawan Road / Route 134, Yorktown Heights, NY, 10598, USA
Yuval Marton
USC Information Sciences Institute (ISI), 4676 Admiralty Way, Suite 1001, Marina del Rey, CA, 90292, USA
David Chiang
Department of Linguistics and the Laboratory for Computational Linguistics and Information Processing (CLIP) at the Institute for Advanced Computer Studies (UMIACS), University of Maryland, College Park, MD, 20742-7505, USA
Philip Resnik

Authors

Yuval Marton
View author publications
You can also search for this author in PubMed Google Scholar
David Chiang
View author publications
You can also search for this author in PubMed Google Scholar
Philip Resnik
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuval Marton.

Additional information

Yuval Marton was at University of Maryland, College Park, at the time the experiments described here took place.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Marton, Y., Chiang, D. & Resnik, P. Soft syntactic constraints for Arabic–English hierarchical phrase-based translation. Machine Translation 26, 137–157 (2012). https://doi.org/10.1007/s10590-011-9111-z

Download citation

Received: 03 July 2010
Accepted: 22 August 2011
Published: 26 October 2011
Issue Date: March 2012
DOI: https://doi.org/10.1007/s10590-011-9111-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Soft syntactic constraints for Arabic–English hierarchical phrase-based translation

Abstract

Access this article

Similar content being viewed by others

English-Arabic Statistical Machine Translation: State of the Art

A Novel Approach by Injecting CCG Supertags into an Arabic–English Factored Translation Machine

Matching Phrases for Arabic-to-English Example-Based Translation System

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Soft syntactic constraints for Arabic–English hierarchical phrase-based translation

Abstract

Access this article

Similar content being viewed by others

English-Arabic Statistical Machine Translation: State of the Art

A Novel Approach by Injecting CCG Supertags into an Arabic–English Factored Translation Machine

Matching Phrases for Arabic-to-English Example-Based Translation System

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation