Using a Large Monolingual Corpus to Improve Translation Accuracy

  • Radu Soricut
  • Kevin Knight
  • Daniel Marcu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2499)


The existence of a phrase in a large monolingual corpus is very useful information, and so is its frequency. We introduce an alternative approach to automatic translation of phrases/sentences that operationalizes this observation. We use a statistical machine translation system to produce alternative translations and a large monolingual corpus to (re)rank these translations. Our results show that this combination yields better translations, especially when translating out-of-domain phrases/sentences. Our approach can be also used to automatically construct parallel corpora from monolingual resources.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Sprung, R., ed.: Translating Into Success: Cutting-Edge Strategies For Going Multilingual In A Global Age. John Benjamins Publishers (2000)Google Scholar
  2. 2.
    Grefenstette, G.: The world wide web as a resource for example-based machine translation tasks. In: ASLIB, Translating and the Computer 21, London (1999)Google Scholar
  3. 3.
    Brown, P., Della Pietra, S., Della Pietra, V., Mercer, R.: The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics 19 (1993) 263–311Google Scholar
  4. 4.
    Knight, K., Al-Onaizan, Y.: Translation with finite-state devices. In: Proceedings of the 4th AMTA Conference. (1998)Google Scholar
  5. 5.
    Germann, U., Jahr, M., Knight, K., Marcu, D., Yamada, K.: Fast decoding and optimal decoding for machine translation. In: Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics (ACL’01), Toulouse, France (2001)Google Scholar
  6. 6.
    Marcu, D.: A surface-based approach to identifying discourse markers and elementary textual units in unrestricted texts. In: Proceedings of the COLING/ACL-98 Workshop on Discourse Relations and Discourse Markers, Montreal, Canada (1998)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Radu Soricut
    • 1
  • Kevin Knight
    • 1
  • Daniel Marcu
    • 1
  1. 1.Information Sciences InstituteUniversity of Southern CaliforniaMarina del Rey

Personalised recommendations