Advertisement

Enhancing Pivot Translation Using Grammatical and Morphological Information

  • Hai-Long Trieu
  • Le-Minh Nguyen
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 781)

Abstract

Pivot translation can be one of the solutions to overcome the problem of unavailable large bilingual corpora for training statistical machine translation models. Nevertheless, the conventional pivot method, which connect source to target phrases via common pivot phrases, lacks some potential connections when pivoting via the surface form of pivot phrases. In this work, we improve the pivot translation method by integrating grammatical and morphological information to connect pivot phrases instead of using only the surface form. Experiments were conducted on several Southeast Asian low-resource language pairs: Indonesian-Vietnamese, Malay-Vietnamese, and Filipino-Vietnamese. By integrating grammatical and morphological information, the proposed method achieved a significant improvement of 0.5 BLEU points. This showed the effectiveness of integrating grammatical and morphological features to pivot translation.

Keywords

Statistical machine translation Pivot methods Factored models Part-of-speech tags Lemma forms 

References

  1. 1.
    Cettolo, M., Girardi, C., Federico, M.: WIT3: web inventory of transcribed and translated talks. In: Proceedings of EAMT, pp. 261–268 (2012)Google Scholar
  2. 2.
    Cherry, C., Foster, G.: Batch tuning strategies for statistical machine translation. In: Proceedings of HLT/NAACL, pp. 427–436. Association for Computational Linguistics (2012)Google Scholar
  3. 3.
    Chu, C., Nakazawa, T., Kurohashi, S.: Constructing a Chinese-Japanese parallel corpus from Wikipedia. In: Proceedings of LREC, pp. 642–647 (2014)Google Scholar
  4. 4.
    Cohn, T., Lapata, M.: Machine translation by triangulation: making effective use of multi-parallel corpora. In: Proceedings of ACL, pp. 728–735. Association for Computational Linguistics, June 2007Google Scholar
  5. 5.
    De Gispert, A., Marino, J.B.: Catalan-English statistical machine translation without parallel corpus: bridging through Spanish. In: Proceedings of LREC, pp. 65–68. Citeseer (2006)Google Scholar
  6. 6.
    El Kholy, A., Habash, N., Leusch, G., Matusov, E., Sawaf, H.: Language independent connectivity strength features for phrase pivot statistical machine translation. In: Proceedings of ACL, pp. 412–418. Association for Computational Linguistics (2013)Google Scholar
  7. 7.
    Heafield, K.: KenLM: Faster and smaller language model queries. In: Proceedings of the Sixth Workshop on Statistical Machine Translation, pp. 187–197. Association for Computational Linguistics (2011)Google Scholar
  8. 8.
    Hewavitharana, S., Vogel, S.: Extracting parallel phrases from comparable data. In: Sharoff, S., Rapp, R., Zweigenbaum, P., Fung, P. (eds.) Building and Using Comparable Corpora, pp. 191–204. Springer, Heidelberg (2013).  https://doi.org/10.1007/978-3-642-20128-8_10 CrossRefGoogle Scholar
  9. 9.
    Hoang, D.T., Bojar, O.: Tmtriangulate: a tool for phrase table triangulation. Prague Bull. Math. Linguist. 104(1), 75–86 (2015)Google Scholar
  10. 10.
    Irvine, A.: Statistical machine translation in low resource settings. In: Proceedings of HLT/NAACL, pp. 54–61. Association for Computational Linguistics (2013)Google Scholar
  11. 11.
    Koehn, P.: Statistical significance tests for machine translation evaluation. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 388–395 (2004)Google Scholar
  12. 12.
    Koehn, P., Hoang, H.: Factored translation models. In: EMNLP-CoNLL, pp. 868–876 (2007)Google Scholar
  13. 13.
    Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., et al.: Moses: open source toolkit for statistical machine translation. In: Proceedings of ACL, pp. 177–180. Association for Computational Linguistics (2007)Google Scholar
  14. 14.
    Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J.R., Bethard, S., McClosky, D.: The stanford coreNLP natural language processing toolkit. In: ACL (System Demonstrations), pp. 55–60 (2014)Google Scholar
  15. 15.
    Nuhn, M., Mauser, A., Ney, H.: Deciphering foreign language by combining language models and context vectors. In: Proceedings of ACL, pp. 156–164. Association for Computational Linguistics (2012)Google Scholar
  16. 16.
    Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Comput. Linguist. 29(1), 19–51 (2003)CrossRefzbMATHGoogle Scholar
  17. 17.
    Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of ACL, pp. 311–318. Association for Computational Linguistics (2002)Google Scholar
  18. 18.
    Ravi, S., Knight, K.: Deciphering foreign language. In: Proceedings of ACL: Human Language Technologies-Volume 1, pp. 12–21. Association for Computational Linguistics (2011)Google Scholar
  19. 19.
    Saluja, A., Hassan, H., Toutanova, K., Quirk, C.: Graph-based semi-supervised learning of translation models from monolingual data. In: Proceedings of ACL. pp. 676–686. Association for Computational Linguistics (2014)Google Scholar
  20. 20.
    Sennrich, R.: Perplexity minimization for translation model domain adaptation in statistical machine translation. In: Proceedings of EAMT, pp. 539–549 (2012)Google Scholar
  21. 21.
    Smith, J.R., Quirk, C., Toutanova, K.: Extracting parallel sentences from comparable corpora using document level alignment. In: Proceedings of HLT/NAACL, pp. 403–411. Association for Computational Linguistics (2010)Google Scholar
  22. 22.
    Thu, Y.K., Pa, W.P., Utiyama, M., Finch, A., Sumita, E.: Introducing the Asian Language Treebank (ALT). In: Proceedings of LREC, pp. 1574–1578 (2016)Google Scholar
  23. 23.
    Utiyama, M., Isahara, H.: A comparison of pivot methods for phrase-based statistical machine translation. In: Proceedings of HLT/NAACL, pp. 484–491. Association for Computational Linguistics (April 2007)Google Scholar
  24. 24.
    Wang, P., Nakov, P., Ng, H.T.: Source language adaptation approaches for resource-poor machine translation. Comput. Linguist. 42, 277–306 (2016)MathSciNetCrossRefGoogle Scholar
  25. 25.
    Wu, H., Wang, H.: Pivot language approach for phrase-based statistical machine translation. In: Proceedings of ACL, pp. 856–863. Association for Computational Linguistics, June 2007Google Scholar
  26. 26.
    Zhu, X., He, Z., Wu, H., Wang, H., Zhu, C., Zhao, T.: Improving pivot-based statistical machine translation using random walk. In: Proceedings of EMNLP, pp. 524–534. Association for Computational Linguistics, October 2013Google Scholar
  27. 27.
    Zhu, X., He, Z., Wu, H., Zhu, C., Wang, H., Zhao, T.: Improving pivot-based statistical machine translation by pivoting the co-occurrence count of phrase pairs. In: Proceedings of EMNLP, pp. 1665–1675. Association for Computational Linguistics (2014)Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2018

Authors and Affiliations

  1. 1.School of Information ScienceJapan Advanced Institute of Science and TechnologyNomiJapan

Personalised recommendations