On the Automatic Learning of Bilingual Resources: Some Relevant Factors for Machine Translation

  • Helena de M. Caseli
  • Maria das Graças V. Nunes
  • Mikel L. Forcada
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5249)


In this paper we present experiments concerned with automatically learning bilingual resources for machine translation: bilingual dictionaries and transfer rules. The experiments were carried out with Brazilian Portuguese (pt), English (en) and Spanish (es) texts in two parallel corpora: pten and ptes. They were designed to investigate the relevance of two factors in the induction process, namely: (1) the coverage of linguistic resources used when preprocessing the training corpora and (2) the maximum length threshold (for transfer rules) used in the induction process. From these experiments, it is possible to conclude that both factors have an influence in the automatic learning of bilingual resources.


Machine translation bilingual resources automatic learning parallel corpora 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Wu, D., Xia, X.: Learning an English-Chinese lexicon from parallel corpus. In: Proc. of AMTA 1994, Columbia, MD, pp. 206–213 (October 1994)Google Scholar
  2. 2.
    Fung, P.: A pattern matching method for finding noun and proper noun translations from noisy parallel corpora. In: Proc. of ACL 1995, pp. 236–243 (1995)Google Scholar
  3. 3.
    Koehn, P., Knight, K.: Learning a translation lexicon from monolingual corpora. In: Proc. of SIGLEX 2002, Philadelphia, pp. 9–16 (July 2002)Google Scholar
  4. 4.
    Schafer, C., Yarowsky, D.: Inducing translation lexicons via diverse similarity measures an bridge languages. In: Proc. of CoNLL 2002, pp. 1–7 (2002)Google Scholar
  5. 5.
    Kaji, H., Kida, Y., Morimoto, Y.: Learning translation templates from bilingual text. In: Proc. of COLING 1992, pp. 672–678 (1992)Google Scholar
  6. 6.
    McTait, K.: Translation patterns, linguistic knowledge and complexity in an approach to EBMT. In: Carl, M., Way, A. (eds.) Recent Advances in EBMT, pp. 1–28. Kluwer Academic Publishers, Netherlands (2003)Google Scholar
  7. 7.
    Menezes, A., Richardson, S.D.: A best-first alignment algorithm for automatic extraction of transfer mappings from bilingual corpora. In: Proc. of the Workshop on Data-driven Machine Translation at ACL 2001, Toulouse, France, pp. 39–46 (2001)Google Scholar
  8. 8.
    Papineni, K., Roukos, S., Ward, T., Zhu, W.: BLEU: a method for automatic evaluation of machine translation. In: Proc. of ACL 2002, pp. 311–318 (2002)Google Scholar
  9. 9.
    Doddington, G.: Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In: Proc. of ARPA Workshop on Human Language Technology, San Diego, pp. 128–132 (2002)Google Scholar
  10. 10.
    Caseli, H.M., Nunes, M.G.V., Forcada, M.L.: Automatic induction of bilingual resources from aligned parallel corpora: application to shallow-transfer machine translation. Machine Translation 20(4), 227–245 (2006)CrossRefGoogle Scholar
  11. 11.
    Koehn, P., Och, F.J., Marcu, D.: Statistical phrase-based translation. In: Proc. of HLT/NAACL pp. 127–133 (2003)Google Scholar
  12. 12.
    Och, F.J., Ney, H.: The alignment template approach to statistical machine translation. Computational Linguistics 30(4), 417–449 (2004)CrossRefzbMATHGoogle Scholar
  13. 13.
    Brown, P., Della-Pietra, V., Della-Pietra, S., Mercer, R.: The mathematics of statistical machine translation: parameter estimation. Computational Linguistics 19(2), 263–312 (1993)Google Scholar
  14. 14.
    Och, F.J., Ney, H.: Improved statistical alignment models. In: Proc. of ACL 2000, Hong Kong, China, pp. 440–447 (October 2000)Google Scholar
  15. 15.
    Caseli, H.M., Nunes, M.G.V., Forcada, M.L.: Evaluating the LIHLA lexical aligner on Spanish, Brazilian Portuguese and Basque parallel texts. Procesamiento del Lenguaje Natural 35, 237–244 (2005)Google Scholar
  16. 16.
    Carbonell, J., Probst, K., Peterson, E., Monson, C., Lavie, A., Brown, R., Levin, L.: Automatic rule learning for resource-limited MT. In: Richardson, S.D. (ed.) AMTA 2002. LNCS (LNAI), vol. 2499, pp. 1–10. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  17. 17.
    Sánchez-Martínez, F., Forcada, M.L.: Automatic induction of shallow-transfer rules for open-source machine translation. In: Proc. of TMI 2007, pp. 181–190 (2007)Google Scholar
  18. 18.
    Caseli, H.M., Nunes, M.G.V.: Automatic induction of bilingual lexicons for machine translation. International Journal of Translation 19, 29–43 (2007)Google Scholar
  19. 19.
    Pei, J., Han, J., Mortazavi-Asl, B., Wang, J., Pinto, H., Chen, Q., Dayal, U., Hsu, M.: Mining sequential patterns by pattern-growth: the PrefixSpan approach. IEEE Transactions on Knowledge and Data Engineering 16(10), 1–17 (2004)CrossRefGoogle Scholar
  20. 20.
    Hofland, K.: A program for aligning English and Norwegian sentences. In: Hockey, S., Ide, N., Perissinotto, G. (eds.) Research in Humanities Computing, pp. 165–178. Oxford University Press, Oxford (1996)Google Scholar
  21. 21.
    Armentano-Oller, C., Carrasco, R.C., Corbí-Bellot, A.M., Forcada, M.L., Ginestí-Rosell, M., Ortiz-Rojas, S., Pérez-Ortiz, J.A., Ramírez-Sánchez, G., Sánchez-Martínez, F., Scalco, M.A.: Open-source Portuguese-Spanish machine translation. In: Vieira, R., Quaresma, P., Nunes, M.d.G.V., Mamede, N.J., Oliveira, C., Dias, M.C. (eds.) PROPOR 2006. LNCS (LNAI), vol. 3960, pp. 50–59. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  22. 22.
    Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Computational Linguistics 29(1), 19–51 (2003)CrossRefzbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Helena de M. Caseli
    • 1
  • Maria das Graças V. Nunes
    • 1
  • Mikel L. Forcada
    • 2
  1. 1.NILC – ICMC, University of São PauloSão CarlosBrazil
  2. 2.Departament de Llenguatges i Sistemes InformàticsUniversitat d’AlacantAlacantSpain

Personalised recommendations