Method to Build a Bilingual Lexicon for Speech-to-Speech Translation Systems

  • Keiji Yasuda
  • Andrew Finch
  • Eiichiro Sumita
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7182)

Abstract

Noun dropping and mis-translations occasionally occurs with Machine Translation (MT) output. These errors can cause communication problems between system users. Some of the MT architectures are able to incorporate bilingual noun lexica, which can improve the translation quality of sentences which include nouns. In this paper, we proposed an automatic method to enable a monolingual user to add new words to the lexicon. In the experiments, we compare the proposed method to three other methods. According to the experimental results, the proposed method gives the best performance in both point of view of Character Error Rate (CER) and Word Error Rate (WER). The improvement from using only a transliteration system is very large, about 13 points in CER and 32 points in WER.

Keywords

Machine Translation Statistical Machine Translation Computational Linguistics Word Error Rate Bilingual Lexicon 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bach, N., Hsiao, R., Eck, M., Charoenpornsawat, P., Vogel, S., Schultz, T., Lane, I., Waibel, A., Black, A.W.: Incremental adaptation of speech-to-speech translation. In: Proc. of Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL), pp. 149–152 (2009)Google Scholar
  2. 2.
    Kawai, H., Isotani, R., Yasuda, K., Sumita, E., Masao, U., Matsuda, S., Ashikari, Y., Nakamura, S.: An overview of a nation-wide field experiment of speech-to-speech translation in fiscal year 2009. In: Proceedings of 2010 Autumn Meeting of Acoustical Society of Japan, pp. 99–102 (2010) (in Japanese)Google Scholar
  3. 3.
    Okuma, H., Yamamoto, H., Sumita, E.: Introducing a translation dictionary into phrase-based smt. The IEICE Transactions on Information and Systems 91-D(7), 2051–2057 (2008)CrossRefGoogle Scholar
  4. 4.
    Koehn, P., Och, F.J., Marcu, D.: Statistical Phrase-Based Translation. In: Proc. of Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL), pp. 127–133 (2003)Google Scholar
  5. 5.
    Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., Herbst, E.: Moses: Open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions, pp. 177–180. Association for Computational Linguistics (2007)Google Scholar
  6. 6.
    Tonoike, M., Kida, M., Takagi, T., Sasaki, Y., Utsuro, T., Sato, S.: Translation Estimation for Technical Terms using Corpus collected from the Web. In: Proceedings of the Pacific Association for Computational Linguistics, pp. 325–331 (2005) Google Scholar
  7. 7.
    Al-Onaizan, Y., Knight, K.: Translating named entities using monolingual and bilingual resources. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 400–408 (2002)Google Scholar
  8. 8.
    Sato, S.: Web-Based Transliteration of Person Names. In: Proceedings of IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, pp. 273–278 (2009)Google Scholar
  9. 9.
    Finch, A., Dixon, P., Sumita, E.: Integrating a joint source channel model into a phrase-based transliteration system. In: Proceedings of NEWS 2011 (2011) will be appearedGoogle Scholar
  10. 10.
    Finch, A., Sumita, E.: A bayesian model of bilingual segmentation for transliteration. In: Proceedings of the Seventh International Workshop on Spoken Language Translation (IWSLT), pp. 259–266 (2010)Google Scholar
  11. 11.
    Fukunishi, T., Finch, A., Yamamoto, S., Sumita, E.: Using features from a bilingual alignment model in transliteration mining. In: Proceedings of NEWS 2011 (2011)Google Scholar
  12. 12.
    Goldwater, S., Griffiths, T.L., Johnson, M.: Contextual dependencies in unsupervised word segmentation. In: ACL-44: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, pp. 673–680. Association for Computational Linguistics, Morristown (2006)Google Scholar
  13. 13.
    Mochihashi, D., Yamada, T., Ueda, N.: Bayesian unsupervised word segmentation with nested pitman-yor language modeling. In: ACL-IJCNLP 2009: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language. Processing of the AFNLP, vol. 1, pp. 100–108. Association for Computational Linguistics, Morristown (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Keiji Yasuda
    • 1
  • Andrew Finch
    • 1
  • Eiichiro Sumita
    • 1
  1. 1.National Institute of Information and Communications TechnologyKeihanna Science CityJapan

Personalised recommendations