Advertisement

Parse and Corpus-Based Machine Translation

  • Vincent VandeghinsteEmail author
  • Scott Martens
  • Gideon Kotzé
  • Jörg Tiedemann
  • Joachim Van den Bogaert
  • Koen De Smet
  • Frank Van Eynde
  • Gertjan van Noord
Chapter
Part of the Theory and Applications of Natural Language Processing book series (NLP)

Abstract

In this paper the PaCo-MT project is described, in which Parse and Corpus-based Machine Translation has been investigated: a data-driven approach to stochastic syntactic rule-based machine translation.In contrast to the phrase-based statistical machine translation systems (PB-SMT) which are string-based and do not use any linguistic knowledge, an MT engine in a different paradigm was built: a tree-based data-driven system that automatically induces translation rules from a large syntactically analysed parallelcorpus. The architecture is presented in detail as well as an evaluation in comparison with our previous work and with the current state-of-the art PB-SMT system Moses.

References

  1. 1.
    Aho, A., Ullman, J.: Syntax directed translations and the pushdown assembler. J. Comput. Syst. Sci. 3, 37–56 (1969)CrossRefGoogle Scholar
  2. 2.
    Bangalore, S., Joshi, A. (eds.): Supertagging. MIT, Cambridge, Massachusetts (2010)Google Scholar
  3. 3.
    Bod, R.: A Computational Model of Language Performance: Data-Oriented Parsing. In: Proceedings of the 15th International Conference on Computational Linguistics (COLING), Nantes, France, pp. 855–856 (1992)Google Scholar
  4. 4.
    Boitet, C., Tomokiyo, M.: Ambiguities and ambiguity labelling: towards ambiguity data bases. In: R. Mitkov, N. Nicolov (eds.) Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP), Tsigov Chark, Bulgaria (1995)Google Scholar
  5. 5.
    Brown, P., Cocke, F., Della Pietra, S., V.J., D.P., Jelinek, F., Lafferty, J., Mercer, R., Roossin, P.: A statistical approach to machine translation. Comput. Linguist. 16 (2), 79–85 (1990)Google Scholar
  6. 6.
    Carl, M., Melero, M., Badia, T., Vandeghinste, V., Dirix, P., Schuurman, I., Markantonatou, S., Sofianopoulos, S., Vassiliou, M., Yannoutsou, O.: METIS-II: low resources machine translation : background, implementation, results, and potentials. Mach. Trans. 22 (1), 67–99 (2008)CrossRefGoogle Scholar
  7. 7.
    Chiang, D.: A hierarchical phrase-based model for statistical machine translation. In: Proceedings of the 43rd Annual Meeting of the ACL, Ann Arbor, US, pp. 263–270. ACL (2005)Google Scholar
  8. 8.
    Chiang, D.: An introduction to synchronous grammars. COLING/ACL Tutorial, Sydney, Australia (2006)Google Scholar
  9. 9.
    Doddington, G.: Automatic evaluation of machine translation quality using n-gram cooccurrence statistics. In: Proceedings of the Human Language Technology Conference (HLT), San Diego, USA, pp. 128–132 (2002)Google Scholar
  10. 10.
    Eisner, J.: Learning non-isomorphic tree mappings for machine translation. In: Proceedings of the 41st Annual Meeting of the ACL, Sapporo, Japan, pp. 205–208. ACL (2003)Google Scholar
  11. 11.
    Fox, H.: Phrasal cohesion and statistical machine translation. In: Proceedings of the 2002 conference on Empirical Methods in Natural Language Processing, Philadelphia, USA, pp. 304–311 (2002)Google Scholar
  12. 12.
    Galley, M., Hopkins, M., Knight, K., Marcu, D.: What’s in a translation rule? In: Proceedings of the HLT Conference of the North American Chapter of the ACL (NAACL), Boston, USA, pp. 273–280 (2004)Google Scholar
  13. 13.
    Gazdar, G., Klein, E., Pullum, G., Sag, I.: Generalized Phrase Structure Grammar. Blackwell, Oxford, UK (1985)Google Scholar
  14. 14.
    Graham, Y.: Sulis: An Open Source Transfer Decoder for Deep Syntactic Statistical Machine Translation. Prague Bull. Math. Linguist. 93, 17–26 (2010)CrossRefGoogle Scholar
  15. 15.
    Graham, Y., van Genabith, J.: Deep Syntax Language Models and Statistical Machine Translation. In: Proceedings of the 4th Workshop on Syntax and Structure in Statistical Translation (SSST-4), Beijing, China, pp. 118–126 (2010)Google Scholar
  16. 16.
    Graham, Y., van Genabith, J.: Factor templates for factored machine translation models. In: Proceedings of the 7th International Workshop on Spoken Language Translation (IWSLT), Paris, France (2010)Google Scholar
  17. 17.
    Guo, Y., van Genabith, J., Wang, H.: Dependency-based N-gram Models for General Purpose Sentence Realisation. In: Proceedings of the 22nd International Conference on Computational Linguistics (COLING), Manchester, UK, pp. 297–304 (2008)Google Scholar
  18. 18.
    Hassan, H., Sima’an, K., Way, A.: Supertagged phrase-based statistical machine translation. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, Prague, Czech Republic, pp. 288–295 (2007)Google Scholar
  19. 19.
    Hearne, M., Tinsley, J., Zhechev, V., Way., A.: Capturing Translational Divergences with a Statistical Tree-to-Tree Aligner. In: Proceedings of the 11th International Conference on Theoretical and Methodological Issues in Machine Translation (TMI), Skvde, Sweden (2007)Google Scholar
  20. 20.
    Hearne, M., Way, A.: Seeing the wood for the trees. Data-Oriented Translation. In: Proceedings of MT Summit IX, New Orleans, US (2003)Google Scholar
  21. 21.
    Klein, D., Manning, C.: Accurate unlexicalized parsing. In: Proceedings of the 41st Annual Meeting of the ACL, Sapporo, Japan, pp. 423–430. ACL (2003)Google Scholar
  22. 22.
    Koehn, P.: Europarl: a parallel corpus for statistical machine translation. In: Proceedings of MT Summit X, Phuket, Thailand, pp. 79–97. IAMT (2005)Google Scholar
  23. 23.
    Koehn, P.: Statistical Machine Translation. Cambridge (2010)Google Scholar
  24. 24.
    Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., D., D., Bojar, O., Constantin, A., Herbst, E.: Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL), Prague, Czech Republic, pp. 177–180 (2007)Google Scholar
  25. 25.
    Kotzé, G.: Improving syntactic tree alignment through rule-based error correction. In: Proceedings of ESSLLI 2011 Student Session, Ljubljana, Slovenia, pp. 122–127 (2011)Google Scholar
  26. 26.
    Kotzé, G.: Rule-induced correction of aligned parallel treebanks. In: Proceedings of Corpus Linguistics, Saint Petersburg, Russia (2011)Google Scholar
  27. 27.
    Lavie, A.: Stat-xfer: A general serach-based syntax-driven framework for machine translation. In: Proceedings of thr 9th International Conference on Intelligent Text Processing and Computational Linguistics, Haifa, Israel, pp. 362–375 (2008)Google Scholar
  28. 28.
    Lewis, P., Stearns, R.: Syntax-directed transduction. J. ACM 15, 465–488 (1968)Google Scholar
  29. 29.
    Lundborg, J., Marek, T., Mettler, M., Volk, M.: Using the Stockholm TreeAligner. In: Proceedings of the 6th Workshop on Treebanks and Linguistic Theories, Bergen, Norway, pp. 73–78 (2007)Google Scholar
  30. 30.
    Marcu, D., Wang, W., Echihabi, A., Knight, K.: SPMT: statistical machine translation with syntactified target language phrases. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, Sydney, Australia (2006)Google Scholar
  31. 31.
    de Marneffe, M., MacCartney, B., Manning, C.: Generating typed dependency parses from phrase structure parses. In: Proceedings of the 5th edition of the International Conference on Language Resources and Evaluation (LREC), Genoa, Italy (2006)Google Scholar
  32. 32.
    van Noord, G.: At last parsing is now operational. In: Proceedings of Traitement Automatique des Langues Naturelles (TALN), Leuven, Belgium, pp. 20–42 (2006)Google Scholar
  33. 33.
    Och, F., Ney, H.: A systematic comparison of various statistical alignment models. Comput. Linguist. 29 (1), 19–51 (2003)CrossRefGoogle Scholar
  34. 34.
    Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 311–318 (2002)Google Scholar
  35. 35.
    Poutsma, A.: Machine Translation with Tree-DOP. In: R. Bod, R. Scha, K. Sima’an (eds.) Data-Oriented Parsing, chap. 18, pp. 339–358. CSLI, Stanford, US (2003)Google Scholar
  36. 36.
    Probst, K., Levin, L., Peterson, E., Lavie, A., Carbonel, J.: MT for Minority Languages Using Elicitation-Based Learning of Syntactic Transfer Rules. Mach. Trans. 17 (4), 245–270 (2002)CrossRefGoogle Scholar
  37. 37.
    Riezler, S., Maxwell III, J.: Grammatical Machine Translation. In: Proceedings of the HLT Conference of the North American Chapter of the ACL (NAACL), New York, USA, pp. 248–255 (2006)Google Scholar
  38. 38.
    Schabes, Y.: Mathematical and Computational Aspects of Lexicalized Grammars. Ph.D. thesis, University of Pennsylvania, (1990)Google Scholar
  39. 39.
    Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: Proceedings of the International Conference on New Methods in Language Processing, Manchester, UK (1994)Google Scholar
  40. 40.
    Snover, M., Dorr, B., Schwartz, R., Micciulla, L., Makhoul, J.: A study of translation edit rate with targeted human annotation. In: Proceedings of Association for Machine Translation in the Americas (2006)Google Scholar
  41. 41.
    Stolcke, A.: SRILM – an extensible language modeling toolkit. In: Proceedings of the International Conference on Spoken Language Processing, Denver, USA (2002)Google Scholar
  42. 42.
    Tiedemann, J.: News from OPUS – a collection of multilingual parallel corpora with Tools and Interfaces. In: Proceedings of Recent Advances in Natural Language Processing (RANLP-2009), Borovets, Bulgaria, pp. 237–248 (2009)Google Scholar
  43. 43.
    Tiedemann, J.: Lingua-align: an experimental toolbox for automatic tree-to-tree alignment. In: Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC’2010), Valetta, Malta (2010)Google Scholar
  44. 44.
    Tiedemann, J., Kotzé, G.: A discriminative approach to tree alignment. In: Proceedings of Recent Advances in Natural Language Processing (RANLP-2009), Borovets, Bulgaria (2009)Google Scholar
  45. 45.
    Vandeghinste, V.: Removing the distinction between a translation memory, a bilingual dictionary and a parallel corpus. In: Proceedings of Trannslation and the Computer 29, ASLIB, London, UK (2007)Google Scholar
  46. 46.
    Vandeghinste, V.: A Hybrid Modular Machine Translation System. LoRe-MT: Low Resources Machine Translation. Ph.D. thesis, K.U. Leuven, Leuven, Belgium (2008)Google Scholar
  47. 47.
    Vandeghinste, V.: Tree-based target language modeling. In: Proceedings of the 13nd International Conference of the European Association for Machine Translation (EAMT-2009), Barcelona, Spain (2009)Google Scholar
  48. 48.
    Vandeghinste, V., Martens, S.: Top-down transfer in example-based MT. In: Proceedings of the 3rd Workshop on Example-based Machine Translation, Dublin, Ireland, pp. 69–76 (2009)Google Scholar
  49. 49.
    Vandeghinste, V., Martens, S.: Bottom-up transfer in example-based machine translation. In: Proceedings of the 14th International Conference of the European Association for Machine Translation (EAMT-2010), Saint-Raphal, France (2010)Google Scholar
  50. 50.
    Varga, D., Németh, L., Halácsy, P., Kornai, A., Trón, V., Nagy, V.: Parallel corpora for medium density languages. In: Proceedings of Recent Advances in Natural Language Processing (RANLP-2005), Borovets, Bulgaria, pp. 590–596 (2005)Google Scholar
  51. 51.
    Velldal, E., Oepen, S.: Statistical ranking in tactical generation. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, Sydney, Australia (2006)Google Scholar
  52. 52.
    Wang, W., May, J., Knight, K., Marcu, D.: Re-structuring, re-labeling, and re-aligning for syntax-based machine translation. Comput. Linguist. 36 (2), 247–277 (2010)CrossRefGoogle Scholar
  53. 53.
    Wu, D.: Stochastic inversion transduction grammars and bilingual parsing of parallel corpora. Comput. Linguist. 23, 377–404 (1997)Google Scholar
  54. 54.
    Yamada, K., Knight, K.: A syntax-based statistical translation model. In: Proceedings of the 39th Annual Meeting of the ACL, Toulouse, France, pp. 523–530. ACL (2001)Google Scholar
  55. 55.
    Zollmann, A., Venugopal, A.: Syntax augmented machine translation via chart parsing. In: Proceedings of the Workshop on Statistical Machine Translation, New York, USA, pp. 138–141 (2006)Google Scholar

Copyright information

© The Author(s) 2013

Open Access. This chapter is distributed under the terms of the Creative Commons Attribution Noncommercial License, which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Authors and Affiliations

  • Vincent Vandeghinste
    • 1
    Email author
  • Scott Martens
    • 2
  • Gideon Kotzé
    • 3
  • Jörg Tiedemann
    • 4
  • Joachim Van den Bogaert
    • 1
  • Koen De Smet
    • 5
  • Frank Van Eynde
    • 1
  • Gertjan van Noord
    • 3
  1. 1.Centrum voor Computerlinguïstiek (CCL)Leuven UniversityLeuvenBelgium
  2. 2.University of Tübingen (previously at CCL)TübingenGermany
  3. 3.Groningen UniversityGroningenThe Netherlands
  4. 4.University of Uppsala (previously at Groningen University)UppsalaSweden
  5. 5.Oneliner bvbaSint-NiklaasBelgium

Personalised recommendations