Advertisement

Independent Fusion of Words and Image for Multimodal Machine Translation

  • Junteng Ma
  • Shihao Qin
  • Minping Chen
  • Xia LiEmail author
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 1104)

Abstract

Multimodal machine translation which combines visual information of image has become one of the research hotpots in recent years. Most of the existing works project the image feature into the text semantic space and merged into the model in different ways. Actually, different source words may capture different visual information. Therefore, we propose a multimodal neural machine translation (MNMT) model that integrates the words and visual information of image independently. The word itself and different key similarity information of an image are independently fused into the text semantics of the word, thereby assisting and enhancing the textual semantic and corresponding visual information of different words. And then we use them for the calculation of the context vector of the attention of decoder of our model. In this paper, different experiments are carried out on the original English-German sentence pairs of the multimodal machine translation dataset, Multi30k, and the Indonesian-Chinese sentence pairs which is manually annotated by human. Compared with the existing MNMT model based on RNN, our model has a better performance and proves the effectiveness of it.

Keywords

Multimodal machine translation Image visual feature Independent fusion Attention mechanism 

Notes

Acknowledgements

This work is supported by National Natural Science Foundation of China (61976062) and the Science and Technology Program of Guangzhou, China (201904010303).

References

  1. 1.
    Baltrusaitis, T., Ahuja, C., Morency, L.: Multimodal machine learning: a survey and taxonomy. In: IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 423–443 (2018)CrossRefGoogle Scholar
  2. 2.
    Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3156–3164 (2014)Google Scholar
  3. 3.
    Huang, P-Y., Liu, F., Shiang, S-R., Oh, J., Dyer, C.: Attention-based multimodal neural machine translation. In: Proceedings of the First Conference on Machine Translation, pp. 639–645 (2016)Google Scholar
  4. 4.
    Calixto, I., Liu, Q., Campbell, N.: Incorporating global visual features into attention-based neural machine translation. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 992–1003 (2017)Google Scholar
  5. 5.
    Calixto, I., Liu, Q., Campbell, N.: Doubly-Attentive decoder for multi-modal neural machine translation. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, pp. 1913–1924 (2017)Google Scholar
  6. 6.
    Caglayan, O., Aransa, W., Bardet, A., Garcia-Martinez, M.,: Bougares, F., Barrault, L.: LIUM-CVC submissions for WMT 2017 multimodal translation task. In: Proceedings of the Conference on Machine Translation (WMT), pp. 432–439 (2017)Google Scholar
  7. 7.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)Google Scholar
  8. 8.
    Russakovsky, O., Deng, J., Su, H., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Luong, M., Pham, H., Manning, C.: Effective approaches to attention-based neural machine translation. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1412–1421 (2015)Google Scholar
  10. 10.
    Papineni, K., Roukos, S., Ward, T., Zhu, W.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 311–318 (2002)Google Scholar
  11. 11.
    Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (ICLR) (2015)Google Scholar
  12. 12.
    Sutskever, I., Vinyals, O., Le, V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems (2014)Google Scholar
  13. 13.
    Elliott, D., Kadar, A.: Imagination improves multimodal translation. In: Proceedings of the The 8th International Joint Conference on Natural Language Processing, pp. 130–141 (2017)Google Scholar
  14. 14.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (ICLR), pp. 1–14 (2015)Google Scholar
  15. 15.
    Liu, Q.: Survey of statistical machine translation. J. Chin. Inf. Process. 17(4), 2–13 (2003)Google Scholar
  16. 16.
    Brown, F., Cocke, J., Pietra, S., et al.: A statistical approach to machine translation. Comput. Linguist. 16, 79–85 (1990)Google Scholar
  17. 17.
    Brown, F., Pietra, S., Pietra, V.: The mathematics of statistical machine translation: parameter estimation. Comput. Linguist. 19(2), 263–311 (1993)Google Scholar
  18. 18.
    Och, F., Ney, H.: Discriminative training and maximum entropy models for statistical machine translation. In: Association for Computational Linguistics (2012)Google Scholar
  19. 19.
    Liu, Y., Wang, K., Zong, C., Su, K.: A unified framework and models for integrating translation memory into phrase-based statistical machine translation. Comput. Speech Lang. 54, 176–206 (2019)CrossRefGoogle Scholar
  20. 20.
    Zhang, J., Zong, C.: Learning a phrase-based translation model from monolingual data with application to domain adaptation. In: The 51st Annual Meeting of the Association for Computational Linguistics (ACL) (2013)Google Scholar
  21. 21.
    Tu, Z., Liu Y, Hwang, Y., Liu, Q., Lin, S.: Dependency forest for statistical machine translation. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 1092–1100 (2010)Google Scholar
  22. 22.
    Kalchbrenner, N., Blunsom, P.: Recurrent continuous translation models. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1700–1709 (2013)Google Scholar
  23. 23.
    Cho, K., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1724–1734 (2014)Google Scholar
  24. 24.
    Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: International Conference on Learning Representations (ICLR) (2015)Google Scholar
  25. 25.
    Bradbury, J., Merity, S., Xiong, C., Socher, R.: Quasi-recurrent neural networks. In: Association for the Advancement of Artificial Intelligence (2017)Google Scholar
  26. 26.
    Kalchbrenner, N., Espeholt, L., Karen, S., Oord, A., Graves, A., Kavukcuoglu, K.: Neural Machine Translation in Linear Time. arXiv preprint https://arxiv.org/abs/1610.10099 (2016)
  27. 27.
    Gehring, J., Auli, M., Grangier, D., Yarats, D, Dauphin, Y.: Convolutional sequence to sequence learning. In: Proceeding ICML 2017 Proceedings of the 34th International Conference on Machine Learning (2017)Google Scholar
  28. 28.
    Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention Is All You Need. arXiv preprint https://arxiv.org/abs/1706.03762 (2017)
  29. 29.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems (NIPS) (2015)Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  1. 1.School of Information Science and TechnologyGuangdong University of Foreign StudiesGuangzhouChina
  2. 2.Guangzhou Key Laboratory of Multilingual Intelligent ProcessingGuangzhouChina

Personalised recommendations