Machine Translation and Type Theory
This paper gives an introduction to automatic translation via examples from the history of the field, where statistical and grammar-based methods alternate. Grammatical Frameword (GF) is introduced as an approach that uses type theory to provide high-quality translation between multiple languages. GF translation is fundamentally grammar-based but can be combined with statistical methods such as learning translation models from a corpus and ranking translation candidates by probabilities.
KeywordsNoun Phrase Machine Translation Target Language Translation System Abstract Syntax
Per Martin-Löf supervised my PhD thesis and taught me how to think about type theory and language. He also introduced me to the noisy channel model of Shannon. He wondered if statistical models were still considered useful in natural language processing, and they have ever since been a recurrent theme in our discussions. With his unique combination of insights in both statistics and logic, and his accurate knowledge of many languages, Per has continued to be a major resource for my work through the 21 years that have passed since my PhD. When I later started to look closer at statistical methods, I received inspiration and guidance from Joakim Nivre, Lluís Màrquez, and Cristina España. Lauri Carlson has helped me to understand the problems of translation in general. The model described in this paper has received substantial contributions from my own PhD students Peter Ljunglöf, Kristofer Johannisson, Janna Khegai, Markus Forsberg, Björn Bringert, Krasimir Angelov, and Ramona Enache. The insightful comments from an anonymous referee were valuable when preparing the final version of the paper. The research leading to these results has received funding from the European Union’s Seventh Framework Programme (FP7/2007–2013) under grant agreement n:o FP7-ICT-247914 (http://www.molto-project.eu).
- Ajdukiewicz, K. 1935. Die syntaktische konnexität. Studia Philosophica1: 1–27.Google Scholar
- Alshawi, H. 1992. The core language engine. Cambridge: MIT.Google Scholar
- Angelov, K. 2009. Incremental parsing with parallel multiple context-free grammars. In Proceedings of EACL’09, Athens.Google Scholar
- Angelov, K., and R. Enache. 2010. Typeful ontologies with direct multilingual verbalization. In CNL 2010, Controlled natural language, Marettimo Island, ed. N. Fuchs and M. Rosner. New Brunswick: ACL.Google Scholar
- Angelov, K., O. Caprotti, R. Enache, T. Hallgren, I. Listenmaa, A. Ranta, J. Saludes, and A. Slaski 2010, 06/2010. D10.2 molto web service, first version. (D10.2).Google Scholar
- Appel, A. 1998. Modern compiler implementation in ML. Cambridge/New York: Cambridge University Press.Google Scholar
- Bar-Hillel, Y. 1964. Language and information. Reading: Addison-Wesley.Google Scholar
- Bringert, B., R. Cooper, P. Ljunglöf, and A. Ranta. 2005. Multimodal dialogue system grammars. In Proceedings of DIALOR’05, ninth workshop on the semantics and pragmatics of dialogue, Nancy, 53–60.Google Scholar
- Brown, P.F., J. Cocke, S.A.D. Pietra, V.J.D. Pietra, F. Jelinek, J.D. Lafferty, R.L. Mercer, and P.S. Roossin. 1990. A statistical approach to machine translation. Computational Linguistics16(2): 76–85.Google Scholar
- Burke, D.A., and K. Johannisson. 2005. Translating formal software specifications to natural language/a grammar-based approach. In Logical Aspects of Computational Linguistics (LACL 2005), Lecture notes in computer science/Lecture notes in artificial intelligence, vol. 3492, ed. P. Blache, E. Stabler, and J. Busquets, R. Moot, 51–66. Berlin/New York: Springer. http://www.springerlink.com/content/?k=LNCS+3492.
- Caprotti, O. 2006. WebALT! Deliver mathematics everywhere. In Proceedings of SITE 2006, Orlando March 20–24. http://webalt.math.helsinki.fi/content/e16/e301/e512/PosterDemoWebALT_e%ng.pdf.Google Scholar
- Curry, H.B. 1961. Some logical aspects of grammatical structure. In Structure of language and its mathematical aspects: Proceedings of the twelfth symposium in applied mathematics, ed. R. Jakobson, 56–68. Providence: American Mathematical Society.Google Scholar
- Donzeau-Gouge, V., G. Huet, G. Kahn, B. Lang, and J. J. Levy. 1975. A structure-oriented program editor: A first step towards computer assisted programming. In International computing symposium (ICS’75). Hsinchu: Nat Chiao Tung University.Google Scholar
- Dowek, G., A. Felty, H. Herbelin, G. Huet, C. Parent, C. Paulin Mohring, B. Werner, and C. Murthy. 1993. The Coq proof assistant user’s guide: version 5.8. Research Report RT-0154, INRIA.Google Scholar
- Dymetman, M., V. Lux, and A. Ranta. 2000. XML and multilingual document authoring: Convergent trends. In Proceedings of the computational linguistics COLING, Saarbrücken, 243–249. International Committee on Computational Linguistics.Google Scholar
- Hallgren, T., and A. Ranta. 2000. An extensible proof text editor. In LPAR-2000, Lecture notes in computer science/Lecture notes in artificial intelligence, vol. 1955, ed. M. Parigot and A. Voronkov pp. 70–84. Berlin: Springer. http://www.cse.chalmers.se/~aarne/articles/lpar2000.pdf.
- Hutchins, W.J., and H.L. Somers. 1992. An introduction to machine translation. London: Academic.Google Scholar
- Johannisson, K. 2005. Formal and informal software specifications. Ph.D. thesis, Department of Computing Science, Chalmers University of Technology and Gothenburg University.Google Scholar
- Jonson, R. 2006. Generating statistical language models from interpretation grammars in dialogue system. In Proceedings of EACL06, Trento.Google Scholar
- Khegai, J., B. Nordström, and A. Ranta. 2003. Multilingual syntax editing in GF. In Intelligent text processing and computational linguistics (CICLing-2003), Mexico City, February 2003, Lecture notes in computer science, vol. 2588, ed. A. Gelbukh, 453–464. Springer. http://www.cs.chalmers.se/~aarne/articles/mexico.ps.gz.
- Koehn, P., and H. Hoang. 2007. Factored translation models. In EMNLP-CoNLL, Prague, 868–876. ACL.Google Scholar
- Ljunglöf, P. 2004. The expressivity and complexity of grammatical framework. Ph.D. thesis, Department of Computing Science, Chalmers University of Technology and Gothenburg University.http://www.cs.chalmers.se/~peb/pubs/p04-PhD-thesis.pdf.Google Scholar
- Ljunglöf, P., G. Amores, R. Cooper, D. Hjelm, O. Lemon, P. Manchón, G. Pérez, and A. Ranta. 2006. Multimodal grammar library. TALK. Talk and Look: Tools for Ambient Linguistic Knowledge. IST-507802. Deliverable 1.2b. http://www.talk-project.org/fileadmin/talk/publications_public/delivera%bles_public/TK_D1-2-2.pdf.Google Scholar
- Luo, Z., and P. Callaghan (1999). Mathematical vernacular and conceptual well-formedness in mathematical language. In Logical aspects of computational linguistics (LACL), Nancy, Lecture notes in computer science/Lecture notes in artificial intelligence, vol. 1582, ed. A. Lecomte, F. Lamarche, and G. Perrier, 231–250.Google Scholar
- Luo, Z., and R. Pollack. 1992. LEGO proof development system. Technical report, University of Edinburgh.Google Scholar
- Magnusson, L. 1994. The implementation of ALF – A proof editor based on Martin-Löf’s monomorphic type theory with explicit substitution. Ph.D. thesis, Department of Computing Science, Chalmers University of Technology and University of Göteborg.Google Scholar
- Martin-Löf, P. 1984. Intuitionistic type theory. Napoli: Bibliopolis.Google Scholar
- Montague, R. 1974. Formal philosophy. New Haven: Yale University Press. Collected papers edited by Richmond Thomason.Google Scholar
- Nordström, B., K. Petersson, and J. Smith. 1990. Programming in Martin-Löf’s type theory: An introduction. Oxford: Clarendon Press.Google Scholar
- Norell, U. 2007. Towards a practical programming language based on dependent type theory. Ph.D. thesis, Department of Computer Science and Engineering, Chalmers University of Technology, SE-412 96 Göteborg, Sweden.Google Scholar
- Papineni, K., S. Roukos, T. Ward, and W.-J. Zhu. 2002. BLEU: A method for automatic evaluation of machine translation. In ACL, Philadelphia, 311–318.Google Scholar
- Perera, N., and A. Ranta (2007). Dialogue system localization with the GF resource grammar library. In SPEECHGRAM 2007: ACL workshop on grammar-based approaches to spoken language processing, June 29, 2007, Prague. http://www.cs.chalmers.se/~aarne/articles/perera-ranta.pdf.
- Pierce, J.R., and J. B. Carroll et al. 1966. Language and machines – Computers in translation and linguistics. ALPAC report.Google Scholar
- Power, R., and D. Scott (1998). Multilingual authoring using feedback texts. In COLING-ACL 98, Montreal.Google Scholar
- Ranta, A. 1994. Type theoretical grammar. Oxford: Oxford University Press.Google Scholar
- Ranta, A. 2004. Grammatical framework: A type-theoretical grammar formalism. The Journal of Functional Programming14(2): 145–189. http://www.cse.chalmers.se/~aarne/articles/gf-jfp.pdf.Google Scholar
- Ranta, A. 2007. Modular grammar engineering in GF. Research on Language and Computation5: 133–158. http://www.cs.chalmers.se/~aarne/articles/multieng3.pdf.
- Ranta, A. 2009a. Grammars as software libraries. In From semantics to computer science. Essays in honour of Gilles Kahn, ed. Y. Bertot, G. Huet, J.-J. Lévy, and G. Plotkin, 281–308. Cambridge/New York: Cambridge University Press. http://www.cse.chalmers.se/~aarne/articles/libraries-kahn.pdf.
- Ranta, A. 2009b. The GF resource grammar library. Linguistics in Language Technology 2. http://elanguage.net/journals/index.php/lilt/article/viewFile/214/158.
- Ranta, A. 2011. Grammatical framework: Programming with multilingual grammars. Stanford: CSLI Publications. ISBN-10: 1-57586-626-9 (Paper), 1-57586-627-7 (Cloth).Google Scholar
- Rayner, M., P. Estrella, and P. Bouillon. 2011. Bootstrapping a statistical speech translator from a rule-based one. In Proceedings of the second international workshop on free/open-source rule-based machine translation, Barcelona. http://hdl.handle.net/10609/5647.
- Rosetta, M.T. 1994. Compositional translation. Dordrecht: Kluwer.Google Scholar
- Shannon, C. 1948. A mathematical theory of communication. The Bell System Technical Journal27(1): 379–423, 623–656.Google Scholar
- Stallman, R. 2001. Using and porting the GNU compiler collection. Cambridge: Free Software Foundation.Google Scholar
- Tyers, F., and J. Nordfalk. 2009. Shallow-transfer rule-based machine translation for Swedish to Danish. In Proceedings of the first international workshop on free/open-source rule-based machine translation, Alicante. http://hdl.handle.net/10045/12024.