Abstract
Different types of media data are represented by the underlying features of different dimensions and different attributes, resulting in heterogeneity and incomparability; on the other hand, in order to use the underlying feature information extracted from cross-media, additional information must be obtained from different styles. There is a compromise between the difference and the semantic ambiguity obtained only from the bottom layer. These problems lead to traditional feature learning methods not suitable for cross-media analysis. This paper proposes a neural network structure that combines feature extraction and context semantics and introduces a new attention mechanism. Tests on experimental data show that the network structure proposed in this paper can obtain a better BLEU score than traditional methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981)
May, P., Ehrlich, H.-C., Steinke, T.: ZIB structure prediction pipeline: composing a complex biological workflow through web services. In: Nagel, W.E., Walter, W.V., Lehner, W. (eds.) Euro-Par 2006. LNCS, vol. 4128, pp. 1148–1158. Springer, Heidelberg (2006). https://doi.org/10.1007/11823285_121
Foster, I., Kesselman, C.: The Grid: Blueprint for a New Computing Infrastructure. Morgan Kaufmann, San Francisco (1999)
Czajkowski, K., Fitzgerald, S., Foster, I., Kesselman, C.: Grid information services for distributed resource sharing. In: 10th IEEE International Symposium on High Performance Distributed Computing, pp. 181–184. IEEE Press, New York (2001)
Foster, I., Kesselman, C., Nick, J., Tuecke, S.: The Physiology of the Grid: an Open Grid Services Architecture for Distributed Systems Integration. Technical report, Global Grid Forum (2002)
National Center for Biotechnology Information. http://www.ncbi.nlm.nih.gov
Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Reading text in the wild with convolutional neural networks. Int. J. Comput. Vision 116(1), 1–20 (2015). https://doi.org/10.1007/s11263-015-0823-z
Dollar, P., et al.: Fast feature pyramids for object detection. IEEE Trans. Pattern Anal. Mach. Intell. 36(8), 1532–1545 (2014)
Gupta, A., et al.: Synthetic data for text localisation in natural images. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2315–2324 (2016)
Tian, Z., Huang, W., He, T., He, P., Qiao, Y.: Detecting text in natural image with connectionist text proposal network. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 56–72. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_4
Transactions on Multimedia, vol. 20, no. 11, pp. 3111–3122 (2018)
Shi, B., et al.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2017)
Li, H., et al.: Towards end-to-end text spotting with convolutional recurrent neural networks. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 5248–5256 (2017)
Busta, M., et al.: Deep TextSpotter: an end-to-end trainable scene text localization and recognition framework. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2223–2231 (2017)
Patel, Y., et al.: E2E-MLT an Unconstrained End-to-End Method for Multi-Language Scene Text. arXiv Preprint ArXiv: 1801.09919 (2018)
Kojima, A., et al.: Generating natural language description of human behavior from video images. In: Proceedings 15th International Conference on Pattern Recognition. ICPR-2000, vol. 4, pp. 728–731 (2000)
Xu, K., et al.: Show, attend and tell: neural image caption generation with visual attention. In: Proceedings of the 32nd International Conference on Machine Learning, pp. 2048–2057 (2015)
Zhao, B., et al.: CAM-RNN: co-attention model based RNN for video captioning. IEEE Trans. Image Process. 28(11), 5552–5565 (2019)
Park, J., et al.: A study of evaluation metrics and datasets for video captioning. In: 2017 International Conference on Intelligent Informatics and Biomedical Sciences (ICIIBMS) (2017)
Bin, Y., et al.: Describing video with attention-based bidirectional LSTM. IEEE Trans. Syst. Man Cybern. 49(7), 2631–2641 (2019)
Krishna, R., et al.: Dense-captioning events in videos. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 706–715 (2017)
Shen, Z., et al.: Weakly supervised dense video captioning. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5159–5167 (2017)
Guadarrama, S., et al.: YouTube2Text: recognizing and describing arbitrary activities using semantic hierarchies and zero-shot recognition. In: 2013 IEEE International Conference on Computer Vision, pp. 2712–2719 (2013)
Gao, X., Hoi, S.C., Zhang, Y., et al.: SOML: sparse online metric learning with application to image retrieval. In: AAAI, pp. 1206–1212 (2014)
Zhang, Y., Gao, X., Chen, Z., et al.: Learning passive-aggressive correlation filter for long-term and short-term visual tracking. J. Electr. Imaging 28(06), 063017 (2019)
Xia, Z., Hong, X., Gao, X., et al.: Spatiotemporal recurrent convolutional networks for recognizing spontaneous micro-expressions. IEEE Trans. Multimed. 22(3), 626–640 (2020)
Zhang, Y., Gao, X., Chen, Z., et al.: Mining spatial-temporal similarity for visual tracking. IEEE Trans. Image Process. 29, 8107–8119 (2020)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Gao, L., Chen, Z., Wang, L., Li, B., Nie, L., Zheng, F. (2022). Cross-Media and Multilingual Image Understanding Method Based on Attention Mechanism. In: Li, X. (eds) Advances in Intelligent Automation and Soft Computing. IASC 2021. Lecture Notes on Data Engineering and Communications Technologies, vol 80. Springer, Cham. https://doi.org/10.1007/978-3-030-81007-8_90
Download citation
DOI: https://doi.org/10.1007/978-3-030-81007-8_90
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-81006-1
Online ISBN: 978-3-030-81007-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)