Skip to main content
Log in

Translation with Scarce Bilingual Resources

  • Published:
Machine Translation

Abstract

Machine translation of human languages is a field almost as old as computers themselves. Recent approaches to this challenging problem aim at learning translation knowledge automatically (or semi-automatically) from online text corpora, especially human-translated documents. For some language pairs, substantial translation resources exist, and these corpus-based systems can perform well. But for most language pairs, data is scarce, andcurrent techniques do not work well. To examine the gap betweenhuman and machine translators, we created an experiment in which humanbeings were asked to translate an unknown language into English on thesole basis of a very small bilingual text. Participants performed quite well,and debriefings revealed a number of valuable strategies. We discuss thesestrategies and apply some of them to a statistical translation system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Al-Onaizan, Y., J. Curin, M. Jahr, K. Knight, J. Lafferty, D. Melamed, F. Och, D. Purdy, N. A., Smith, and D. Yarowsky: 1999, Statistical Machine Translation', Final report, JHU Workshop 1999. Technical report, CLSP, Johns Hopkins University, Baltimore, MD.

    Google Scholar 

  • Al-Onaizan, Y., U. Germann, U. Hermjakob, K. Knight, P. Koehn, D. Marcu, and K. Yamada: 2000, ‘Translating with Scarce Resources’, in Proceedings of the Seventeenth National Conference on Artificial Intelligence, Twelfth Conference on Innovative Applications of Artificial Intelligence, Austin, TX.

  • Alshawi, H., A. L. Buchsbaum, and F. Xia: ‘A Comparison of Head Transducers and Transfer for a Limited Domain Translation Application’, in 35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics, Madrid, Spain, pp. 360–365.

  • Brown, P., S. A. Della Pietra, V. J. Della Pietra, and R. L. Mercer: 1993, ‘The Mathematics of StatisticalMachine Translation: Parameter Estimation’, Computational Linguistics 19, 263–311.

    Google Scholar 

  • Brown, R. D.: 2000, ‘Automated Generalization of Translation Examples’, in Proceedings of the 18th International Conference on Computational Linguistics: COLING 2000 in Europe, Saarbrücken, Germany, pp. 125–131.

  • Church, K. W.: 1993, ‘Charalign: A Program for Aligning Parallel Texts at the Character Level’, in 31st Annual Meeting of the Association for Computational Linguistics, Columbus, OH, pp. 1–16.

  • Germann, U., M. Jahr, K. Knight, D. Marcu, and K. Yamada: 2001, ‘Fast Decoding and Optimal Decoding for Machine Translation’, in Association for Computational Linguistics 39th Annual Meeting and 10th Conference of the European Chapter, Toulouse, France, 228–235.

  • Hull, G.: 1999, Tetum: Language Manual for East Timor, Academy of East Timor Studies.

  • Knight, K.: 1997, ‘Automating Knowledge Acquisition for Machine Translation’, AI Magazine 18, 81–96.

    Google Scholar 

  • Knight, K.: 1999, ‘A Statistical MT Tutorial Wookbook, Prepared in Connection with the JHU Summer Workshop’, Technical report, USC/ISI, Los Angeles, CA. Available at www.isi.edu/natural-language/mt/wkbk.rtf.

    Google Scholar 

  • Langkilde, I. and K. Knight: 1998, ‘Generation that Exploits Corpus-Based Statistical Knowledge’, in COLING-ACL’ 98: 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Montreal, Quebec, pp. 704–710.

  • Melamed, I. D.: 2000, Empirical Methods for Exploiting Parallel Texts, MIT Press, Cambridge, MA.

    Google Scholar 

  • Nagao, M.: 1984, ‘A Framework of a Mechanical Translation between Japanese and English by Analogy Principle’, in A. Elithorn and R. Barnerji (eds), Artificial and Human Intelligence, North-Holland, Amsterdam, pp. 173–180.

    Google Scholar 

  • Och, F. J., C. Tillmann, and H. Ney: 1999, ‘Improved Alignment Models for Statistical Machine Translation’, in Proceedings of the 1999 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, College Park, MD, pp. 20–28.

  • Wahlster, W. (ed.): 2000, Verbmobil: Foundations of Speech-to-Speech Translation, Springer, Berlin.

    Google Scholar 

  • White, J. and T. O'Connell: 1994, ‘Evaluation in the ARPA Machine Translation Program: 1993 Methodology’, in Proceedings of the ARPA Human Language Technology Workshop, Plainsboro, NJ.

  • Wu, D.: 1997, ‘Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora’, Computational Linguistics 23, 377–404.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Al-Onaizan, Y., Germann, U., Hermjakob, U. et al. Translation with Scarce Bilingual Resources. Machine Translation 17, 1–17 (2002). https://doi.org/10.1023/A:1025539822079

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1025539822079

Navigation