Advertisement

Machine Translation

, Volume 23, Issue 1, pp 23–63 | Cite as

Symbolic-to-statistical hybridization: extending generation-heavy machine translation

  • Nizar HabashEmail author
  • Bonnie Dorr
  • Christof Monz
Open Access
Article

Abstract

The last few years have witnessed an increasing interest in hybridizing surface-based statistical approaches and rule-based symbolic approaches to machine translation (MT). Much of that work is focused on extending statistical MT systems with symbolic knowledge and components. In the brand of hybridization discussed here, we go in the opposite direction: adding statistical bilingual components to a symbolic system. Our base system is Generation-heavy machine translation (GHMT), a primarily symbolic asymmetrical approach that addresses the issue of Interlingual MT resource poverty in source-poor/target-rich language pairs by exploiting symbolic and statistical target-language resources. GHMT’s statistical components are limited to target-language models, which arguably makes it a simple form of a hybrid system. We extend the hybrid nature of GHMT by adding statistical bilingual components. We also describe the details of retargeting it to Arabic–English MT. The morphological richness of Arabic brings several challenges to the hybridization task. We conduct an extensive evaluation of multiple system variants. Our evaluation shows that this new variant of GHMT—a primarily symbolic system extended with monolingual and bilingual statistical components—has a higher degree of grammaticality than a phrase-based statistical MT system, where grammaticality is measured in terms of correct verb-argument realization and long-distance dependency translation.

Keywords

Hybrid machine translation Generation-heavy machine translation Statistical machine translation Arabic–English machine translation 

Notes

Acknowledgements

This work has been supported, in part, by Army Research Lab Cooperative Agreement DAAD190320020, NSF CISE Research Infrastructure Award EIA0130422, Office of Naval Research MURI Contract FCPO.810548265, DoD Contract MDA904-96-C-1250, ONR MURI Contract FCPO.810548265, Mitre Contract 010418-7712, the GALE program of the Defense Advanced Research Projects Agency, Contracts No. HR0011-06-2-0001 and HR0011-06-C-0023, and the Human Language Technology Center of Excellence. Any opinions, findings, conclusions or recommendations expressed in this article are those of the authors and do not necessarily reflect the views of the sponsors. We would like to thank Necip Fazil Ayan and Nitin Madnani for help with some of the experiments done. We would like thank Kishore Papineni for providing us with the confidence interval computation code. We would like to thank Owen Rambow, Srinivas Bangalore and Alexis Nasr for providing us with their MICA parser. We would also like to thank Amy Weinberg for helpful conversations.

Open Access

This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

References

  1. Abdel-Monem A, Shaalan K, Rafea A, Baraka H (2003) A proposed approach for generating Arabic from interlingua in a multilingual machine translation system. In: Proceedings of the 4th conference on language engineering. Cairo, Egypt, pp 197–206Google Scholar
  2. Alsharaf H, Cardey S, Greenfield P, Shen Y (2004) Problems and solutions in machine translation involving Arabic, Chinese and French. In: Proceedings of the international conference on information technology. Las Vegas, NA, pp 293–297Google Scholar
  3. Antworth E (1990) PC-KIMMO: a two-level processor for morphological analysis. Dallas Summer Institute of Linguistics, Dallas, TXGoogle Scholar
  4. Ayan NF, Borr B, Habash N (2004) Multi-align: combining linguistic and statistical techniques to improve alignments for adaptable MT. In: Proceedings of the conference of the Association for Machine Translation in the Americas (AMTA-2004). Washington DC, USA, pp 17–26Google Scholar
  5. Aymerich J (2001) Generation of noun-noun compounds in the Spanish–English machine translation system SPANAM. In: Proceedings of the eighth machine translation summit (MT SUMMIT VIII). Santiago de Compostela, SpainGoogle Scholar
  6. Banerjee S, Lavie A (2005) METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the ACL workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization. Ann Arbor, MI, pp 65–72Google Scholar
  7. Bangalore S, Rambow O (2000a) Corpus-based lexical choice in natural language generation. In: ACL 2000: 38th annual meeting of the association for computational linguistics. Hong Kong, China, pp 464–471Google Scholar
  8. Bangalore S, Rambow O (2000b) Exploiting a probabilistic hierarchical model for generation. In: Proceedings of the 18th international conference on computational linguistics. Saarbrücken, Germany, pp 42–48Google Scholar
  9. Beaven J (1992) Shake and bake machine translation. In: Proceedings of fifteenth [sic] international conference on computational linguistics. Nantes, France, pp 603–609Google Scholar
  10. Bikel D (2002) Design of a multi-lingual, parallel-processing statistical parsing engine. In: Proceedings of HLT 2002, second international conference on human language technology conference. San Diego, CA, pp 178–182Google Scholar
  11. Black E, Abney S, Flickinger D, Gdaniec C, Grishman R, Harrison P, Hindle D, Ingria R, Jelinek F, Klavans J, Liberman M, Marcus M, Roukos S, Santorini B, Strzalkowski T (1991) A procedure for quantitatively comparing the syntactic coverage of English grammars. In: Proceedings of the 1991 DARPA speech and natural language workshop. Pacific Grove, CA, Morgan Kaufmann, pp 306–311Google Scholar
  12. Brown R, Frederking R (1995) Applying statistical English language modeling to symbolic machine translation. In: Proceedings of the sixth international conference on theoretical and methodological issues in machine translation. Leuven, Belgium, pp 221–239Google Scholar
  13. Brown P, Della-Pietra S, Della-Pietra V, Mercer R (1993) The mathematics of machine translation: parameter estimation. Comput Linguist 19(2): 263–311Google Scholar
  14. Brown RD, Hutchinson R, Bennett PN, Carbonell JG, Jansen P (2003) Reducing boundary friction using translation-fragment overlap. In: MT Summit IX, Proceedings of the ninth machine translation summit. New Orleans, LA, pp 24–31Google Scholar
  15. Buckwalter T (2002) Buckwalter Arabic morphological analyzer version 1.0. Linguistic Data Consortium Catalog No.: LDC2002L49Google Scholar
  16. Callison-Burch C, Osborne M, Koehn P (2006) Re-evaluating the role of BLEU in machine translation research. In: Proceedings of the 11th conference of the European chapter of the association for computational linguistics (EACL’06). Trento, Italy, pp 249–256Google Scholar
  17. Carbonell J, Klein S, Miller D, Steinbaum M, Grassiany T, Frey J (2006) Context-based machine translation. In: Proceedings of the 7th conference of the association for machine translation in the Americas: visions for the future of machine translation. Cambridge, MA, pp 19–28Google Scholar
  18. Charniak E (1997) Statistical parsing with a context-free grammar and word statistics. In: Proceedings of the AAAI. Providence, RI, pp 598–603Google Scholar
  19. Charniak E (2000) A maximum-entropy-inspired parser. In: Proceedings of the 1st North American chapter of the association for computational linguistics conference. Seattle, WA, pp 132–139Google Scholar
  20. Charniak E, Johnson M (2001) Edit detection and parsing for transcribed speech. In: Proceedings of the second meeting of the North American chapter of the association for computational linguistics. Pittsburgh, PA, pp 118–126Google Scholar
  21. Collins M (1997) Three generative, lexicalised models for statistical parsing. In: 35th annual meeting of the association for computational linguistics and 8th conference of the European chapter of the association for computational linguistics, proceedings of the conference. Madrid, Spain, pp 16–23Google Scholar
  22. Collins M, Koehn P, Kucerova I (2005) Clause restructuring for statistical machine translation. In: 43rd annual meeting of the association for computational linguistics. Ann Arbor, MI, pp 531–540Google Scholar
  23. Cormen TH, Leiserson CE, Rivest RL, Stein C (2001) Introduction to algorithms, 2nd edn. The MIT Press, Cambridge, MAzbMATHGoogle Scholar
  24. Crego JM, Mariño JB (2007) Syntax-enhanced N-gram-based SMT. In: Machine translation Summit XI, proceedings. Copenhagen, Denmark, pp 111–118Google Scholar
  25. Daumé H III, Knight K, Langkilde-Geary I, Marcu D, Yamada K (2002) The importance of lexicalized syntax models for natural language generation tasks. In: Proceedings of the international natural language generation conference (INLG-02). New York, NY, pp 9–16Google Scholar
  26. Diab M, Hacioglu K, Jurafsky D (2004) Automatic tagging of Arabic text: from raw text to base phrase chunks. In: Proceedings of the 5th meeting of the North American chapter of the association for computational linguistics/human language technologies conference (HLT-NAACL04). Boston, MA, pp 149–152Google Scholar
  27. Doddington G (2002) Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In: Proceedings of the second international conference on human language technology research. San Francisco, CA, pp 138–145Google Scholar
  28. Dorr BJ (1993a) Interlingual Machine translation: a parameterized approach. Artif Intell 63(1 & 2): 429–492CrossRefGoogle Scholar
  29. Dorr BJ (1993b) Machine translation: a view from the Lexicon. The MIT Press, Cambridge, MAGoogle Scholar
  30. Dorr BJ (2001) LCS verb database. Technical Report Online Software Database, University of Maryland, College Park, MD (with Mari Olsen and Nizar Habash and Scott Thomas). http://www.umiacs.umd.edu/~bonnie/LCS_Database_Docmentation.html
  31. Dorr BJ, Habash N (2002) Interlingua approximation: a generation-heavy approach. In: Workshop on interlingua reliability, fifth conference of the association for machine translation in the Americas, AMTA-2002. Tiburon, CA, pp 1–6Google Scholar
  32. Dorr BJ, Jordan PW, Benoit JW (1999) A survey of current research in machine translation. In: Zelkowitz M (eds) Advances in computers. Academic Press, London, pp 1–68Google Scholar
  33. Dorr BJ, Pearl L, Hwa R, Habash N (2002) DUSTer: a method for unraveling cross-language divergences for statistical word-level alignment. In: Proceedings of the 5th conference of the association for machine translation in the Americas (AMTA-02). Springer-Verlag, Berlin/Heidelberg, pp 31–43Google Scholar
  34. Dugast L, Senellart J, Koehn P (2009) Selective addition of corpus-extracted phrasal lexical rules to a rule-based machine translation system. In: MT Summit XII, proceedings of the twelfth machine translation summit. Ottawa, ON, Canada, pp 222–229Google Scholar
  35. El Isbihani A, Khadivi S, Bender O, Ney H (2006) Morpho-syntactic Arabic preprocessing for Arabic to English statistical machine translation. In: Proceedings of the NAACL workshop on statistical machine translation. New York, NY, pp 15–22Google Scholar
  36. Fellbaum C (1998) WordNet: an electronic lexical database. MIT Press, Cambridge, MA. http://www.cogsci.princeton.edu/~wn(2000, September 7)
  37. Font-Llitjós A, Vogel S (2007) A walk on the other side: adding statistical components to a transfer-based translation system. In: Proceedings of the workshop on syntax and structure in statistical translation at the human language technology conference of the North American chapter of the association for computational linguistics. Rochester, NY, pp 72–79Google Scholar
  38. Giménez J, Màrquez L (2007) Linguistic features for automatic evaluation of heterogenous MT systems. In: Proceedings of the second workshop on statistical machine translation. Prague, Czech Republic, pp 256–264Google Scholar
  39. Goldwater S, McClosky D (2005) Improving statistical MT through morphological analysis. In: HLT/EMNLP 2005, proceedings of human language technology conference and conference on empirical methods in natural language processing. Vancouver, BC, Canada, pp 676–683Google Scholar
  40. Graff D (1994) UN parallel text (Spanish-English). Linguistic Data Consortium Catalog No. LDC94T4AGoogle Scholar
  41. Graff D (2003a) Arabic Gigaword. Linguistic Data Consortium Catalog No. LDC2003T12Google Scholar
  42. Graff D (2003b) English Gigaword corpus. Linguistic Data Consortium Catalog No. LDC2003T05Google Scholar
  43. Grimshaw J, Mester A (1988) Light verbs and theta-marking. Linguist Inq 19: 205–232Google Scholar
  44. Habash N (2000) oxyGen: a language independent linearization engine. In: AMTA-2000, fourth conference of the association for machine translation in the Americas: envisioning machine translation in the information future. Cuernavaca, Mexico, pp 68–79Google Scholar
  45. Habash N (2003a) Generation heavy hybrid machine translation. Ph.D. thesis, University of Maryland, College Park, MDGoogle Scholar
  46. Habash N (2003b) Matador: a large scale Spanish-English GHMT system. In: MT Summit IX, proceedings of the ninth machine translation summit. New Orleans, LA, pp 149–156Google Scholar
  47. Habash N (2004) The use of a structural N-gram language model in generation-heavy hybrid machine translation. In: Belz A, Evans R, Piwek P (eds) Natural language generation, third international conference, INLG 2004. Springer-Verlag, Berlin, Heidelberg, NY, pp 61–69Google Scholar
  48. Habash N (2007a) Arabic morphological representations for machine translation. In: van den Bosch A, Soudi A, Neumann G (eds) Arabic computational morphology: knowledge-based and empirical methods. Springer, Dordrecht, The Netherlands, pp 263–285CrossRefGoogle Scholar
  49. Habash N (2007b) Syntactic preprocessing for statistical MT. In: Machine translation summit XI, proceedings. Copenhagen, Denmark, pp 215–222Google Scholar
  50. Habash N, Dorr BJ (2002) Handling translation divergences: combining statistical and symbolic techniques in generation-heavy machine translation. In: Machine translation: from research to real users, 5th conference of the association for machine translation in the Americas, AMTA 2002, proceedings. Springer-Verlag, Berlin Heidelberg, New York, pp 84–93Google Scholar
  51. Habash N, Dorr BJ (2003) A categorial variation database for English. In: HLT-NAACL: human language technology conference of the North American chapter of the association for computational linguistics, Vol. 1. Edmonton, AL, Canada, pp 96–102Google Scholar
  52. Habash N, Elkholy A (2008) SEPIA: surface span extension to syntactic dependency precision-based MT evaluation. In: Proceedings of the NIST metrics for machine translation workshop at the association for machine translation in the Americas conference, AMTA-2008. Waikiki, HIGoogle Scholar
  53. Habash N, Rambow O (2004) Extracting a tree adjoining grammar from the Penn Arabic Treebank. In: Proceedings of Traitement Automatique du Langage Naturel (TALN-04). pp 277–284. Fez, MoroccoGoogle Scholar
  54. Habash N, Rambow O (2005) Arabic tokenization, part-of-speech tagging and morphological disambiguation in one fell swoop. In: 43rd annual meeting of the association for computational linguistics (ACL’05). Ann Arbor, MI, pp 573–580Google Scholar
  55. Habash N, Sadat F (2006) Arabic preprocessing schemes for statistical machine translation. In: Proceedings of the 7th meeting of the North American chapter of the association for computational linguistics/human language technologies conference (HLT-NAACL06). New York, NY, pp 49–52Google Scholar
  56. Habash N, Dorr BJ, Traum D (2003) Hybrid natural language generation from lexical conceptual structures. Mach Transl 18: 81–127CrossRefGoogle Scholar
  57. Habash N, Soudi A, Buckwalter T (2007) On Arabic transliteration. In: van den Bosch A, Soudi A, Neumann G (eds) Arabic computational morphology: knowledge-based and empirical methods. Springer, Dordrecht, The Netherlands, pp 15–22CrossRefGoogle Scholar
  58. Han C, Lavoie B, Palmer M, Rambow O, Kittredge R, Korelsky T, Kim N, Kim M (2000) Handling structural divergences and recovering dropped arguments in a Korean/English machine translation system. In: AMTA-2000, fourth conference of the association for machine translation in the Americas: envisioning machine translation in the information future. Cuernavaca, Mexico, pp 40–53Google Scholar
  59. Jackendoff R (1983) Semantics and cognition. The MIT Press, Cambridge, MAGoogle Scholar
  60. Jackendoff R (1990) Semantic structures. The MIT Press, Cambridge, MAGoogle Scholar
  61. Johnson M (2001) Joint and conditional estimation of tagging and parsing models. In: Association for computational linguistics, 39th annual meeting and 10th conference of the European chapter, proceedings of the conference. Toulouse, France, pp 314–321Google Scholar
  62. Knight K, Hatzivassiloglou V (1995) Two-level, many-paths generation. In: 33rd annual meeting of the association for computational linguistics (ACL-95). Cambridge, MA, pp 252–260Google Scholar
  63. Koehn P (2004a) Pharaoh: a beam search decoder for phrase-based statistical machine translation models. In: Proceedings of the 6th biennial conference of the association for machine translation in the Americas. Washington, DC, pp 115–124Google Scholar
  64. Koehn P (2004b) Statistical significance tests for machine translation evaluation. In: Proceedings of the 2004 conference on empirical methods in natural language processing conference. Barcelona, Spain, pp 388–395Google Scholar
  65. Koehn P, Och F, Marcu D (2003) Statistical phrase-based translation. In: HLT-NAACL: human language technology conference of the North American chapter of the association for computational linguistics. Edmonton, AL, Canada, pp 127–133Google Scholar
  66. Kulick S, Gabbard R, Marcus M (2006) Parsing the Arabic Treebank: analysis and improvements. In: Proceedings of the Treebanks and linguistic theories conference. Prague, Czech Republic, pp 31–42Google Scholar
  67. Langkilde I (2000) Forest-based statistical sentence generation. In: 1st meeting of the North American chapter of the association for computational linguistics, proceedings. Seattle, WA, pp 170–177Google Scholar
  68. Langkilde I, Knight K (1998a) Generating word lattices from abstract meaning representation. Technical report, Information Science Institute, University of Southern California, Marina del Rey, CAGoogle Scholar
  69. Langkilde I, Knight K (1998b) Generation that exploits corpus-based statistical knowledge. In: COLING-ACL 98, 36th annual meeting of the association for computational linguistics and 17th international conference on computational linguistics, proceedings of the conference. Montreal, QC, Canada, pp 704–710Google Scholar
  70. Lavoie B, Kittredge R, Korelsky T, Rambow O (2000) A framework for MT and multilingual NLG systems based on uniform lexico-structural processing. In: 6th applied natural language processing conference, proceedings of the conference. Seattle, WA, pp 63–67Google Scholar
  71. Lavoie B, White M, Korelsky T (2001) Inducing lexico-structural transfer rules from parsed bi-texts. In: Proceedings of the 39th annual meeting of the association for computational linguistics—DDMT workshop. Toulouse, France, pp 17–24Google Scholar
  72. Lee Y-S (2004) Morphological analysis for statistical machine translation. In: Proceedings of the 5th meeting of the North American chapter of the association for computational linguistics/human language technologies conference (HLT-NAACL04). Boston, MA, pp 57–60Google Scholar
  73. Levin B (1993) English verb classes and alternations: a preliminary investigation. University of Chicago Press, Chicago, ILGoogle Scholar
  74. Liu D, Gildea D (2005) Syntactic features for evaluation of machine translation. In: Proceedings of the ACL workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization. Ann Arbor, MI, pp 25–32Google Scholar
  75. Maamouri M, Bies A, Buckwalter T, Mekki W (2004) The Penn Arabic Treebank: building a large-scale annotated Arabic Corpus. In: NEMLAR conference on Arabic language resources and tools. Cairo, Egypt, pp 102–109Google Scholar
  76. Macleod C, Grishman R, Meyers A, Barrett L, Reeves R(1998) NOMLEX: a lexicon of nominalizations. In: Proceedings of EURALEX’98. Liège, Belgium, pp 187–193Google Scholar
  77. Marcus MP, Santorini B, Marcinkiewicz MA (1994) Building a large annotated Corpus of English: the Penn Treebank. Comput Linguist 19(2): 313–330Google Scholar
  78. Mel’čuk I (1988) Dependency syntax: theory and practice. State University of New York Press, Albany, NYGoogle Scholar
  79. Nasr A, Rambow O (2006) Parsing with lexicalized probabilistic recursive transition networks. In: Yli-Jyrä A, Karttunen L, Karhumäki J (eds) Finite-state methods and natural language processing, vol 4002 of lecture notes in computer science. Springer-Verlag, Berlin/Heidelberg, pp 156–166Google Scholar
  80. Nasr A, Rambow O, Palmer M, Rosenzweig J (1997) Enriching lexical transfer with cross-linguistic semantic features (or how to do interlingua without interlingua). In: Proceedings of the 2nd international workshop on interlingua. San Diego, CAGoogle Scholar
  81. Nguyen TP, Shimazu A (2006) Improving phrase-based statistical machine translation with morphosyntactic transformation. Mach Transl 20(3): 147–166CrossRefGoogle Scholar
  82. Nießen S, Ney H (2004) Statistical machine translation with scarce resources using morpho-syntactic information. Comput Linguist 30(2): 181–204CrossRefGoogle Scholar
  83. Och FJ (2003) Minimum error rate training for statistical machine translation. In: 41st annual meeting of the association for computational linguistics. Sapporo, Japan, pp 160–167Google Scholar
  84. Och FJ (2005) Google system description for the 2005 NIST MT evaluation. In: MT Eval workshop (unpublished talk)Google Scholar
  85. Owczarzak K, van Genabith J, Way A (2007) Labelled dependencies in machine translation evaluation. In: Proceedings of the second workshop on statistical machine translation. Prague, Czech Republic, pp 104–111Google Scholar
  86. Papineni K, Roukos S, Ward T, Zhu W-J (2002) BLEU: a method for automatic evaluation of machine translation. In: 40th annual meeting of the association for computational linguistics. Philadelphia, PA, pp 311–318Google Scholar
  87. Popović M, Ney H (2004) Towards the use of word stems and suffixes for statistical machine translation. In: Proceedings of the 4th international conference on language resources and evaluation (LREC). Lisbon, Portugal, pp 1585–1588Google Scholar
  88. Porter M (1980) An algorithm for suffix stripping. Program 14(3): 130–137Google Scholar
  89. Press WH, Teukolsky SA, Vetterling WT, Flannery BP (2002) Numerical recipes in C++. Cambridge University Press, Cambridge, UKGoogle Scholar
  90. Quirk C, Menezes A, Cherry C (2005) Dependency treelet translation: syntactically informed phrasal SMT. In: 43rd annual meeting of the association for computational linguistics. Ann Arbor, MI, pp 271–279Google Scholar
  91. Ratnaparkhi A (2000) Trainable methods for surface natural language generation. In: Proceedings of the 1st annual North American association of computational linguistics (NAACL-2000). Seattle, WA, pp 194–201Google Scholar
  92. Resnik P (1997) Evaluating multilingual gisting of web pages. AAAI symposium on natural language processing for the world wide web, Stanford, CAGoogle Scholar
  93. Resnik P, Olsen M, Diab M (1999) The bible as a parallel corpus: annotating the book of 2000 tongues. Comput Humanit 33: 129–153CrossRefGoogle Scholar
  94. Riesa J, Yarowsky D (2006) Minimally supervised morphological segmentation with applications to machine translation. In: Proceedings of the 7th conference of the association for machine translation in the Americas: visions for the future of machine translation. Cambridge, MA, pp 185–192Google Scholar
  95. Rogers W (2000) TREC Spanish corpus. Linguistic Data Consortium catalog no. LDC2000T51Google Scholar
  96. Roth R, Rambow O, Habash N, Diab M, Rudin C (2008) Arabic morphological tagging, diacritization, and lemmatization using lexeme models and feature ranking. In: 46th annual meeting of the association for computational linguistics: human language technologies, proceedings of the conference, short papers. Columbus, OH, pp 117–120Google Scholar
  97. Sadat F, Habash N (2006) Combination of Arabic preprocessing schemes for statistical machine translation. In: Proceedings of the 21st international conference on computational linguistics and 44th annual meeting of the association for computational linguistics. Sydney, Australia, pp 1–8Google Scholar
  98. Senellart J (2006) Boosting linguistic rule-based MT system with corpus-based approaches. In: Presentation. GALE PI Meeting. Boston, MAGoogle Scholar
  99. Sharaf M (2002) Implications of the agreement features in (English to Arabic) machine translation. Master’s thesis, Al-Azhar University, Cairo, EgyptGoogle Scholar
  100. Sima’an K (2000) Tree-gram parsing: lexical dependencies and structural relations. In: 38th annual meeting of the association for computational linguistics (ACL’00). Hong Kong, China, pp 37–44Google Scholar
  101. Snover M, Dorr BJ, Schwartz R, Micciulla L, Makhoul J (2006) A study of translation error rate with targeted human annotation. In: Proceedings of the 7th conference of the association for machine translation in the Americas: visions for the future of machine translation. Cambridge, MA, pp 223–231Google Scholar
  102. Soudi A (2004) Challenges in the generation of Arabic from interlingua. In: Proceedings of Traitement Automatique des Langues Naturelles (TALN-04). Fez, Morocco, pp 343–350Google Scholar
  103. Soudi A, Cavalli-Sforza V, Jamari A (2002) A prototype English-to-Arabic interlingua-based MT system. In: Proceedings of the third international conference on language resources and evaluation: workshop on Arabic language resources and evaluation: status and prospects. Las Palmas de Gran Canaria, Spain, pp 18–25Google Scholar
  104. Stolcke A. (2002) SRILM—an extensible language modeling toolkit. In: Proceedings of the international conference on spoken language processing (ICSLP), vol 2. Denver, CO, pp 901–904Google Scholar
  105. Tanaka T, Baldwin T (2003) Translation selection for Japanese–English noun-noun compounds. In: MT Summit IX, proceedings of the ninth machine translation summit. New Orleans, LA, pp 378–385Google Scholar
  106. Tapanainen P, Jarvinen T (1997) A non-projective dependency parser. In: Proceedings of the 5th conference on applied natural language pro cessing. Washington, DC, pp 64–71Google Scholar
  107. Traum D, Habash N (2000) Generation from lexical conceptual structures. In: Proceedings of the workshop on applied interlinguas, North American association of computational linguistics/applied natural language processing conference, NAACL/ANLP-2000. Seattle, WA, pp 34–41Google Scholar
  108. Vauquois B (1968) A survey of formal grammars and algorithms for recognition and transformation in machine translation. In: IFIP congress-68. Edinburgh, UK, pp 254–260Google Scholar
  109. Watanabe H, Kurohashi S, Aramaki E (2000) Finding structural correspondences from bilingual parsed corpus for corpus-based translation. In: Proceedings of the 18th international conference on computational linguistics, vol 2. Saarbrücken, Germany, pp 906–912Google Scholar
  110. Whitelock P (1992) Shake-and-bake translation. In: Proceedings of fifteenth [sic] international conference on computational linguistics. Nantes, France, pp 784–791Google Scholar
  111. Xia F, McCord M (2004) Improving a statistical MT system with automatically learned rewrite patterns. In: Proceedings of the 20th international conference on computational linguistics (COLING 2004). Geneva, Switzerland, pp 508–514Google Scholar
  112. Zhang Y, Zens R, Ney H (2007) Chunk-level reordering of source language sentences with automatically learned rules for statistical machine translation. In: Proceedings of the workshop on syntax and structure in statistical translation at the human language technology conference of the North American chapter of the association for computational linguistics. Rochester, NY, pp 1–8Google Scholar
  113. Zollmann A, Venugopal A, Vogel S (2006) Bridging the inflection morphology gap for Arabic statistical machine translation. In: Proceedings of the human language technology conference of the NAACL, companion volume: short papers. New York, NY, pp 201–204Google Scholar

Copyright information

© The Author(s) 2009

Authors and Affiliations

  1. 1.Center for Computational Learning SystemsColumbia UniversityNew YorkUSA
  2. 2.Institute for Advanced Computer StudiesUniversity of MarylandCollege ParkUSA
  3. 3.Informatics InstituteUniversity of AmsterdamAmsterdamThe Netherlands

Personalised recommendations