Machine Translation

, Volume 20, Issue 4, pp 227–245 | Cite as

Automatic induction of bilingual resources from aligned parallel corpora: application to shallow-transfer machine translation

  • Helena M. CaseliEmail author
  • Maria das Graças V. Nunes
  • Mikel L. Forcada


The availability of machine-readable bilingual linguistic resources is crucial not only for rule-based machine translation but also for other applications such as cross-lingual information retrieval. However, the building of such resources (bilingual single-word and multi-word correspondences, translation rules) demands extensive manual work, and, as a consequence, bilingual resources are usually more difficult to find than “shallow” monolingual resources such as morphological dictionaries or part-of-speech taggers, especially when they involve a less-resourced language. This paper describes a methodology to build automatically both bilingual dictionaries and shallow-transfer rules by extracting knowledge from word-aligned parallel corpora processed with shallow monolingual resources (morphological analysers, and part-of-speech taggers). We present experiments for Brazilian Portuguese–Spanish and Brazilian Portuguese–English parallel texts. The results show that the proposed methodology can enable the rapid creation of valuable computational resources (bilingual dictionaries and shallow-transfer rules) for machine translation and other natural language processing tasks).


Machine translation Automatic induction Transfer rule Bilingual dictionary Shallow transfer 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Armentano-Oller C, Carrasco RC, Corbí-Bellot AM, Forcada ML, Ginestí-Rosell M, Ortiz-Rojas S, Pérez-Ortiz JA, Ramírez-Sánchez G, Sánchez–Martínez F, Scalco MA (2006) Open-source Portuguese–Spanish machine translation. In: Proceedings of the VII Encontro para o Processamento Computacional da Língua Portuguesa Escrita e Falada. Itatiaia, RJ, Brazil, pp 50–59Google Scholar
  2. Bick E (2000) The parsing system Palavras, automatic grammatical analysis of Portuguese in a constraint grammar framework. Ph.D. Thesis, Aarhus University Press, DenmarkGoogle Scholar
  3. Brown P, Della-Pietra V, Della-Pietrac S and Mercer R (1993). The mathematics of statistical machine translation: parameter estimation. Comput Linguist 19(2): 263–312 Google Scholar
  4. Canals-Marote R, Esteve-Guillén A, Garrido-Alenda A, Guardiola-Savall M, Iturraspe-Bellver A, Montserrat-Buendia S, Ortiz-Rojas S, Pastor-Pina H, Pérez-Antón P, Forcada M (2001) The Spanish–Catalan machine translation system interNOSTRUM. In: MT Summit VIII: Machine Translation in the Information Age, Proceedings Santiago de Compostela, Spain, pp 73–76Google Scholar
  5. Carbonell J, Probst K, Peterson E, Monson C, Lavie A, Brown R, Levin L (2002) Automatic rule learning for resource-limited MT. In: AMTA’02: Proceedings of the 5th Conference of the Association for Machine Translation in the Americas: From Research to Real Users. Lecture Notes In Computer Science, vol 2499, London, UK, pp 1–10Google Scholar
  6. Caseli HM (2007) Indução de léxicos bilíngües e regras para a tradução automática. Ph.D. Thesis, ICMC-USP, São Paulo, BrazilGoogle Scholar
  7. Caseli HM and Nunes MGV (2007). Automatic induction of bilingual lexicons for machine translation. Int J Transl 19: 29–43 Google Scholar
  8. Caseli HM, Nunes MGV and Forcada ML (2005). Evaluating the LIHLA lexical aligner on Spanish, Brazilian Portuguese and Basque parallel texts. Procesamiento del Lenguaje Natural 35: 237–244 Google Scholar
  9. Doddington G (2002) Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In: Proceedings of ARPA Workshop on Human Language Technology, San Diego, CA, pp 128–132Google Scholar
  10. Fung P (1995) A pattern matching method for finding noun and proper noun translations from noisy parallel corpora. In: Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics, Cambridge, MA, pp 236–243Google Scholar
  11. Hutchins J and Somers H (1992). An introduction to machine translation. Academic Press, London Google Scholar
  12. Kaji H, Kida Y, Morimoto Y (1992) Learning translation templates from bilingual text. In: Proceedings of the fifteenth [sic] International Conference on Computational Linguistics, COLING-92. Nantes, France, pp 672–678Google Scholar
  13. Koehn P, Knight K (2002) Learning a translation lexicon from monolingual corpora. In: Proceedings of the Workshop of the ACL Special Interest Group on the Lexicon (SIGLEX), Philadelphia, PA, pp 9–16Google Scholar
  14. Langlais P, Foster G, Lapalme G (2001) Integrating bilingual lexicons in a probabilistic translation assistant. In: MT Summit VIII: Machine Translation in the Information Age, Proceedings, Santiago de Compostela, Spain, pp 197–202Google Scholar
  15. Lavie A, Probst K, Peterson E, Vogel S, Levin L, Font-Llitjós A, Carbonell J (2004) A trainable transfer-based machine translation approach for languages with limited resources. In: Proceedings of the 9th Workshop of the European Association for Machine Translation (EAMT-04), Valletta, Malta, pp 1–8Google Scholar
  16. McTait K (2003). Translation patterns, linguistic knowledge and complexity in an approach to EBMT. In: Carl, M and Way, A (eds) Recent advances in example-based machine translation, pp 307–338. Kluwer Academic Publishers, Dordrecht, The Netherlands Google Scholar
  17. Melamed ID, Green R, Turian JP (2003) Precision and recall of machine translation. In: Proceedings of the Conference on Human Language Technology and the North American Chapter of the Association for Computational Linguistics (HLT/NAACL 2003), Edmonton, Canada, pp 61–63Google Scholar
  18. Menezes A, Richardson SD (2001) A best-first alignment algorithm for automatic extraction of transfer mappings from bilingual corpora. In: Proceedings of the Workshop on Data-driven Machine Translation at 39th Annual Meeting of the ACL and 10th Meeting of the European Chapter, Toulouse, France, pp 39–46Google Scholar
  19. Och FJ, Ney H (2000) Improved statistical alignment models. In: Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics, Hong Kong, China, pp 440–447Google Scholar
  20. Och FJ and Ney H (2003). A systematic comparison of various statistical alignment models. Comput Linguist 29(1): 19–51 CrossRefGoogle Scholar
  21. Och FJ and Ney H (2004). The alignment template approach to statistical machine translation. Comput Linguist 30(4): 417–449 CrossRefGoogle Scholar
  22. Papineni K, Roukos S, Ward T, Zhu W (2002) BLEU: a method for automatic evaluation of machine translation. In: ACL-02: the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, PA, pp 311–318Google Scholar
  23. Paumier S (2006). Unitex 1.2 user manual. Université Paris-Est, Marne-la-Vallée, France Google Scholar
  24. Pei J, Han J, Mortazavi-Asl B, Wang J, Pinto H, Chen Q, Dayal U and Hsu M (2004). Mining sequential patterns by pattern-growth: the PrefixSpan approach. IEEE Trans Knowl Data Eng 16(10): 1–17 CrossRefGoogle Scholar
  25. Probst K (2005) Learning transfer rules for machine translation with limited data. Ph.D. Thesis, Carnegie Mellon University, Pittsburgh, PAGoogle Scholar
  26. Sánchez–Martínez F, Ney H (2006) Using alignment templates to infer shallow-transfer machine translation rules. In: Pyysala S, Salakoski T, Ginter D, Pahikkala T (eds) Advances in natural language processing, Proceedings of 5th International Conference on Natural Language Processing FinTAL, vol. 4139 of Lecture Notes in Computer Science, Turku, Finland, pp 756–767Google Scholar
  27. Schafer C, Yarowsky D (2002) Inducing translation lexicons via diverse similarity measures and bridge languages. In: Proceedings of CoNLL-2002, Taipei, Taiwan, pp 1–7Google Scholar
  28. Wu D, Xia X (1994) Learning an English–Chinese lexicon from parallel corpus. In: Proceedings of the 1st Conference of the Association for Machine Translation in the Americas (AMTA-1994), Columbia, MD pp 206–213Google Scholar

Copyright information

© Springer Science+Business Media B.V. 2007

Authors and Affiliations

  • Helena M. Caseli
    • 1
    Email author
  • Maria das Graças V. Nunes
    • 1
  • Mikel L. Forcada
    • 2
  1. 1.NILC – ICMCUniversity of São PauloSão CarlosBrazil
  2. 2.Departament de Llenguatges i Sistemes InformàticsUniversitat d’AlacantAlacantSpain

Personalised recommendations